Commit 47e4079c authored by Andi Kleen's avatar Andi Kleen Committed by Linus Torvalds

[PATCH] x86-64 merge

Lots of changes that have accumulated over the last weeks.

This makes it compile and boot again, Lots of bug fixes, including
security fixes.  Several speedups.

Only changes x86-64 specific files.

 - Use private copy of siginfo.h (for si_band)
 - Align 32bit vsyscall coredump (from Roland McGrath)
 - First steps towards 64bit vsyscall coredump (not working yet)
 - Use in kernel trampoline for signals
 - Merge APIC pm update from Pavel/Mikael
 - Security fix for ioperm (from i386)
 - Reenable vsyscall dumping for 32bit coredumps
 - Fix bugs in 32bit coredump that could lead to oopses.
 - Fix 64bit vsyscalls
 - Revert change in pci-gart.c: pci_alloc_consistent must use an
   0xffffffff mask hardcoded.
 - Fix bug in noexec= option handling
 - Export fake_node
 - Cleanups from Pavel
 - Disable 32bit vsyscall coredump again.  Still has some problems.
 - Implement new noexec= and noexec32= options to give a wide choice
   of support for non executable mappings for 32bit and 64bit processes.
   The default is now to honor PROT_EXEC, but mark stack and heap
   PROT_EXEC.
 - 32bit emulation changes from Pavel: use compat_* types.
 - (2.4) Use physical address for GART register.
 - Convert debugreg array to individual members and clean up ptrace
   access.  This saves 16 byte per task.
 - (2.4) Use new streamlined context switch code.  This avoids a
   pipeline stall and pushes the register saving to C code.
 - Save flags register in context switch
 - Clean up SMP early bootup.  Remove some unnecessary code.
 - (2.4) Process numa= option early
 - (2.4) Merge 2.4 clear_page, copy_*_user, copy_page, memcpy, memset.
   These are much faster.  clear/copy_page don't force the new page out
   of memory now which should speed up user processes.  Also full
   workaround for errata #91.
 - Some cleanup in pageattr.c code.
 - Fix warning in i387.h
 - Fix wrong PAGE_KERNEL_LARGE define.  This fixes a security hole and
   makes AGP work again.
 - Fix wrong segment exception handler to not crash.
 - Fix incorrect swapgs handling in bad iret exception handling
 - Clean up some boot printks
 - Micro optimize exception handling preamble.
 - New reboot handling.  Supports warm reboot and BIOS reboot vector
   reboot now.
 - (2.4) Use MTRRs by default in vesafb
 - Fix bug in put_dirty_page: use correct page permissions for the stack
 - Fix type of si_band in asm-generic/siginfo.h to match POSIX/glibc
   (needs checking with other architecture maintainers)
 - (2.4) Define ARCH_HAS_NMI_WATCHDOG
 - Minor cleanup in calling.h
 - IOMMU tuning: only flush the GART TLB when the IOMMU aperture area
   allocation wraps.  Also don't clear entries until needed.  This
   should increase IO performance for IOMMU devices greatly.  Still a
   bit experimental, handle with care.
 - Unmap the IOMMU aperture from kernel mapping to prevent unwanted CPU
   prefetches.
 - Make IOMMU_LEAK_TRACE depend on IOMMU_DEBUG
 - Fix minor bug in pci_alloc_consistent - always check against the dma
   mask of the device, not 0xffffffff.
 - Remove streamining mapping delayed flush in IOMMU: not needed anymore
   and didn't work correctly in 2.5 anyways.
 - Fix the bad pte warnings caused by the SMP/APIC bootup.
 - Forward port 2.4 fix: ioperm was changing the wrong io ports in some
   cases.
 - Minor cleanups
 - Some cleanups in pageattr.c (still buggy)
 - Fix some bugs in the AGP driver.
 - Forward port from 2.4: mask all reserved bits in debug register in
   ptrace.  Previously gdb could crash the kernel by passing invalid
   values.
 - Security fix: make sure FPU is in a defined state after an
   FXSAVE/FXRSTOR exception occurred.
 - Eats keys on panic (works around a buggy KVM)
 - Make user.h user includeable.
 - Disable sign compare warnings for gcc 3.3-hammer
 - Use DSO for 32bit vsyscalls and dump it in core dumps.  Add dwarf2
   information for the vsyscalls.
   Thanks to Richard Henderson for helping me with the nasty parts of
   it.  I had to do some changes over his patch and it's currently only
   lightly tested.  Handle with care.  This only affects 32bit programs
   that use a glibc 3.2 with sysenter support.
 - Security fixes for the 32bit ioctl handlers.  Also some simplications
   and speedups.
 - gcc 3.3-hammer compile fixes for inline assembly
 - Remove acpi.c file corpse.
 - Lots of warning fixes
 - Disable some Dprintks to make the bootup quieter again
 - Clean up ptrace a bit (together with warning fixes)
 - Merge with i386 (handle ACPI dynamic irq entries properly)
 - Disable change_page_attr in pci-gart for now.  Strictly that's
   incorrect, need to do more testing for the root cause of the current
   IOMMU problems.
 - Update defconfig
 - Disable first prefetch in copy_user that is likely to trigger Opteron
   Errata #91
 - More irqreturn_t fixes
 - Add pte_user and fix the vsyscall ptrace hack in generic code.
   It's still partly broken
 - Port verbose MCE handler from 2.4
parent eaf7c976
...@@ -652,6 +652,7 @@ config IOMMU_DEBUG ...@@ -652,6 +652,7 @@ config IOMMU_DEBUG
config IOMMU_LEAK config IOMMU_LEAK
bool "IOMMU leak tracing" bool "IOMMU leak tracing"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL
depends on IOMMU_DEBUG
help help
Add a simple leak tracer to the IOMMU code. This is useful when you Add a simple leak tracer to the IOMMU code. This is useful when you
are debugging a buggy device driver that leaks IOMMU mappings. are debugging a buggy device driver that leaks IOMMU mappings.
......
...@@ -46,6 +46,7 @@ CFLAGS += -pipe ...@@ -46,6 +46,7 @@ CFLAGS += -pipe
CFLAGS += -fno-reorder-blocks CFLAGS += -fno-reorder-blocks
# should lower this a lot and see how much .text is saves # should lower this a lot and see how much .text is saves
CFLAGS += -finline-limit=2000 CFLAGS += -finline-limit=2000
CFLAGS += -Wno-sign-compare
#CFLAGS += -g #CFLAGS += -g
# don't enable this when you use kgdb: # don't enable this when you use kgdb:
ifneq ($(CONFIG_X86_REMOTE_DEBUG),y) ifneq ($(CONFIG_X86_REMOTE_DEBUG),y)
......
...@@ -4,7 +4,6 @@ ...@@ -4,7 +4,6 @@
CONFIG_X86_64=y CONFIG_X86_64=y
CONFIG_X86=y CONFIG_X86=y
CONFIG_MMU=y CONFIG_MMU=y
CONFIG_SWAP=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_X86_CMPXCHG=y CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y CONFIG_EARLY_PRINTK=y
...@@ -18,6 +17,7 @@ CONFIG_EXPERIMENTAL=y ...@@ -18,6 +17,7 @@ CONFIG_EXPERIMENTAL=y
# #
# General setup # General setup
# #
CONFIG_SWAP=y
CONFIG_SYSVIPC=y CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y CONFIG_SYSCTL=y
...@@ -47,8 +47,12 @@ CONFIG_X86_CPUID=y ...@@ -47,8 +47,12 @@ CONFIG_X86_CPUID=y
CONFIG_X86_IO_APIC=y CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y CONFIG_MTRR=y
CONFIG_HUGETLB_PAGE=y # CONFIG_HUGETLB_PAGE is not set
CONFIG_SMP=y CONFIG_SMP=y
# CONFIG_PREEMPT is not set
CONFIG_K8_NUMA=y
CONFIG_DISCONTIGMEM=y
CONFIG_NUMA=y
CONFIG_HAVE_DEC_LOCK=y CONFIG_HAVE_DEC_LOCK=y
CONFIG_NR_CPUS=8 CONFIG_NR_CPUS=8
CONFIG_GART_IOMMU=y CONFIG_GART_IOMMU=y
...@@ -222,9 +226,8 @@ CONFIG_NET=y ...@@ -222,9 +226,8 @@ CONFIG_NET=y
# #
CONFIG_PACKET=y CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set # CONFIG_PACKET_MMAP is not set
CONFIG_NETLINK_DEV=y # CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set # CONFIG_NETFILTER is not set
CONFIG_FILTER=y
CONFIG_UNIX=y CONFIG_UNIX=y
# CONFIG_NET_KEY is not set # CONFIG_NET_KEY is not set
CONFIG_INET=y CONFIG_INET=y
...@@ -239,8 +242,9 @@ CONFIG_IP_MULTICAST=y ...@@ -239,8 +242,9 @@ CONFIG_IP_MULTICAST=y
# CONFIG_SYN_COOKIES is not set # CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set # CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set # CONFIG_INET_ESP is not set
# CONFIG_XFRM_USER is not set # CONFIG_INET_IPCOMP is not set
# CONFIG_IPV6 is not set # CONFIG_IPV6 is not set
# CONFIG_XFRM_USER is not set
# #
# SCTP Configuration (EXPERIMENTAL) # SCTP Configuration (EXPERIMENTAL)
...@@ -331,6 +335,11 @@ CONFIG_E1000=m ...@@ -331,6 +335,11 @@ CONFIG_E1000=m
# CONFIG_R8169 is not set # CONFIG_R8169 is not set
# CONFIG_SK98LIN is not set # CONFIG_SK98LIN is not set
CONFIG_TIGON3=y CONFIG_TIGON3=y
#
# Ethernet (10000 Mbit)
#
# CONFIG_IXGB is not set
# CONFIG_FDDI is not set # CONFIG_FDDI is not set
# CONFIG_HIPPI is not set # CONFIG_HIPPI is not set
# CONFIG_PPP is not set # CONFIG_PPP is not set
...@@ -405,15 +414,7 @@ CONFIG_KEYBOARD_ATKBD=y ...@@ -405,15 +414,7 @@ CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_SERIAL is not set
CONFIG_INPUT_JOYSTICK=y # CONFIG_INPUT_JOYSTICK is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
# CONFIG_JOYSTICK_TWIDDLER is not set
# CONFIG_INPUT_JOYDUMP is not set
# CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set # CONFIG_INPUT_MISC is not set
...@@ -452,6 +453,7 @@ CONFIG_UNIX98_PTY_COUNT=256 ...@@ -452,6 +453,7 @@ CONFIG_UNIX98_PTY_COUNT=256
# #
# I2C Hardware Sensors Chip support # I2C Hardware Sensors Chip support
# #
# CONFIG_I2C_SENSOR is not set
# #
# Mice # Mice
...@@ -468,8 +470,7 @@ CONFIG_UNIX98_PTY_COUNT=256 ...@@ -468,8 +470,7 @@ CONFIG_UNIX98_PTY_COUNT=256
# Watchdog Cards # Watchdog Cards
# #
# CONFIG_WATCHDOG is not set # CONFIG_WATCHDOG is not set
# CONFIG_INTEL_RNG is not set CONFIG_HW_RANDOM=y
# CONFIG_AMD_RNG is not set
# CONFIG_NVRAM is not set # CONFIG_NVRAM is not set
CONFIG_RTC=y CONFIG_RTC=y
# CONFIG_DTLK is not set # CONFIG_DTLK is not set
...@@ -481,8 +482,8 @@ CONFIG_RTC=y ...@@ -481,8 +482,8 @@ CONFIG_RTC=y
# Ftape, the floppy tape device driver # Ftape, the floppy tape device driver
# #
# CONFIG_FTAPE is not set # CONFIG_FTAPE is not set
# CONFIG_AGP is not set CONFIG_AGP=y
# CONFIG_AGP_GART is not set CONFIG_AGP_AMD_8151=y
# CONFIG_DRM is not set # CONFIG_DRM is not set
# CONFIG_MWAVE is not set # CONFIG_MWAVE is not set
CONFIG_RAW_DRIVER=y CONFIG_RAW_DRIVER=y
...@@ -497,58 +498,76 @@ CONFIG_RAW_DRIVER=y ...@@ -497,58 +498,76 @@ CONFIG_RAW_DRIVER=y
# #
# CONFIG_VIDEO_DEV is not set # CONFIG_VIDEO_DEV is not set
#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set
# #
# File systems # File systems
# #
# CONFIG_QUOTA is not set CONFIG_EXT2_FS=y
CONFIG_AUTOFS_FS=y # CONFIG_EXT2_FS_XATTR is not set
# CONFIG_AUTOFS4_FS is not set CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_REISERFS_FS=y CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set # CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
# CONFIG_XFS_RT is not set
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_QUOTA is not set
CONFIG_AUTOFS_FS=y
# CONFIG_AUTOFS4_FS is not set
#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
# CONFIG_JOLIET is not set
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set
#
# DOS/FAT/NT Filesystems
#
# CONFIG_FAT_FS is not set
# CONFIG_NTFS_FS is not set
#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set # CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set # CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set # CONFIG_HFS_FS is not set
# CONFIG_BEFS_FS is not set # CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set # CONFIG_BFS_FS is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
# CONFIG_FAT_FS is not set
# CONFIG_EFS_FS is not set # CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set # CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_HUGETLBFS=y
CONFIG_ISO9660_FS=y
# CONFIG_JOLIET is not set
# CONFIG_ZISOFS is not set
# CONFIG_JFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_VXFS_FS is not set # CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_HPFS_FS is not set # CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set # CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_SYSV_FS is not set # CONFIG_SYSV_FS is not set
# CONFIG_UDF_FS is not set
# CONFIG_UFS_FS is not set # CONFIG_UFS_FS is not set
CONFIG_XFS_FS=m
# CONFIG_XFS_RT is not set
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_POSIX_ACL is not set
# #
# Network File Systems # Network File Systems
# #
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
CONFIG_NFS_FS=y CONFIG_NFS_FS=y
CONFIG_NFS_V3=y CONFIG_NFS_V3=y
# CONFIG_NFS_V4 is not set # CONFIG_NFS_V4 is not set
...@@ -556,14 +575,16 @@ CONFIG_NFSD=y ...@@ -556,14 +575,16 @@ CONFIG_NFSD=y
CONFIG_NFSD_V3=y CONFIG_NFSD_V3=y
# CONFIG_NFSD_V4 is not set # CONFIG_NFSD_V4 is not set
CONFIG_NFSD_TCP=y CONFIG_NFSD_TCP=y
CONFIG_SUNRPC=y
# CONFIG_SUNRPC_GSS is not set
CONFIG_LOCKD=y CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y CONFIG_EXPORTFS=y
# CONFIG_CIFS is not set CONFIG_SUNRPC=y
# CONFIG_SUNRPC_GSS is not set
# CONFIG_SMB_FS is not set # CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set # CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
# CONFIG_AFS_FS is not set # CONFIG_AFS_FS is not set
# #
...@@ -594,6 +615,7 @@ CONFIG_DUMMY_CONSOLE=y ...@@ -594,6 +615,7 @@ CONFIG_DUMMY_CONSOLE=y
# USB support # USB support
# #
# CONFIG_USB is not set # CONFIG_USB is not set
# CONFIG_USB_GADGET is not set
# #
# Bluetooth support # Bluetooth support
...@@ -615,6 +637,9 @@ CONFIG_MAGIC_SYSRQ=y ...@@ -615,6 +637,9 @@ CONFIG_MAGIC_SYSRQ=y
# CONFIG_INIT_DEBUG is not set # CONFIG_INIT_DEBUG is not set
CONFIG_KALLSYMS=y CONFIG_KALLSYMS=y
# CONFIG_FRAME_POINTER is not set # CONFIG_FRAME_POINTER is not set
CONFIG_IOMMU_DEBUG=y
CONFIG_IOMMU_LEAK=y
CONFIG_MCE_DEBUG=y
# #
# Security options # Security options
......
...@@ -5,3 +5,12 @@ ...@@ -5,3 +5,12 @@
obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o \ obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o \
ia32_signal.o tls32.o \ ia32_signal.o tls32.o \
ia32_binfmt.o fpu32.o ptrace32.o ipc32.o syscall32.o ia32_binfmt.o fpu32.o ptrace32.o ipc32.o syscall32.o
$(obj)/syscall32.o: $(src)/syscall32.c $(obj)/vsyscall.so
# The DSO images are built using a special linker script.
$(obj)/vsyscall.so: $(src)/vsyscall.lds $(obj)/vsyscall.o
$(CC) -m32 -nostdlib -shared -s -Wl,-soname=linux-vsyscall.so.1 \
-o $@ -Wl,-T,$^
AFLAGS_vsyscall.o = -m32
...@@ -6,11 +6,11 @@ ...@@ -6,11 +6,11 @@
* of ugly preprocessor tricks. Talk about very very poor man's inheritance. * of ugly preprocessor tricks. Talk about very very poor man's inheritance.
*/ */
#include <linux/types.h> #include <linux/types.h>
#include <linux/compat.h>
#include <linux/config.h> #include <linux/config.h>
#include <linux/stddef.h> #include <linux/stddef.h>
#include <linux/rwsem.h> #include <linux/rwsem.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/compat.h>
#include <linux/string.h> #include <linux/string.h>
#include <linux/binfmts.h> #include <linux/binfmts.h>
#include <linux/mm.h> #include <linux/mm.h>
...@@ -23,12 +23,17 @@ ...@@ -23,12 +23,17 @@
#include <asm/i387.h> #include <asm/i387.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/ia32.h> #include <asm/ia32.h>
#include <asm/vsyscall32.h>
#define ELF_NAME "elf/i386" #define ELF_NAME "elf/i386"
#define AT_SYSINFO 32 #define AT_SYSINFO 32
#define AT_SYSINFO_EHDR 33
#define ARCH_DLINFO NEW_AUX_ENT(AT_SYSINFO, 0xffffe000) #define ARCH_DLINFO do { \
NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
NEW_AUX_ENT(AT_SYSINFO_EHDR, VSYSCALL32_BASE); \
} while(0)
struct file; struct file;
struct elf_phdr; struct elf_phdr;
...@@ -54,6 +59,47 @@ typedef unsigned int elf_greg_t; ...@@ -54,6 +59,47 @@ typedef unsigned int elf_greg_t;
#define ELF_NGREG (sizeof (struct user_regs_struct32) / sizeof(elf_greg_t)) #define ELF_NGREG (sizeof (struct user_regs_struct32) / sizeof(elf_greg_t))
typedef elf_greg_t elf_gregset_t[ELF_NGREG]; typedef elf_greg_t elf_gregset_t[ELF_NGREG];
/*
* These macros parameterize elf_core_dump in fs/binfmt_elf.c to write out
* extra segments containing the vsyscall DSO contents. Dumping its
* contents makes post-mortem fully interpretable later without matching up
* the same kernel and hardware config to see what PC values meant.
* Dumping its extra ELF program headers includes all the other information
* a debugger needs to easily find how the vsyscall DSO was being used.
*/
#define ELF_CORE_EXTRA_PHDRS (VSYSCALL32_EHDR->e_phnum)
#define ELF_CORE_WRITE_EXTRA_PHDRS \
do { \
const struct elf32_phdr *const vsyscall_phdrs = \
(const struct elf32_phdr *) (VSYSCALL32_BASE \
+ VSYSCALL32_EHDR->e_phoff); \
int i; \
Elf32_Off ofs = 0; \
for (i = 0; i < VSYSCALL32_EHDR->e_phnum; ++i) { \
struct elf_phdr phdr = vsyscall_phdrs[i]; \
if (phdr.p_type == PT_LOAD) { \
ofs = phdr.p_offset = offset; \
offset += phdr.p_filesz; \
} \
else \
phdr.p_offset += ofs; \
phdr.p_paddr = 0; /* match other core phdrs */ \
DUMP_WRITE(&phdr, sizeof(phdr)); \
} \
} while (0)
#define ELF_CORE_WRITE_EXTRA_DATA \
do { \
const struct elf32_phdr *const vsyscall_phdrs = \
(const struct elf32_phdr *) (VSYSCALL32_BASE \
+ VSYSCALL32_EHDR->e_phoff); \
int i; \
for (i = 0; i < VSYSCALL32_EHDR->e_phnum; ++i) { \
if (vsyscall_phdrs[i].p_type == PT_LOAD) \
DUMP_WRITE((void *) (u64) vsyscall_phdrs[i].p_vaddr, \
vsyscall_phdrs[i].p_filesz); \
} \
} while (0)
struct elf_siginfo struct elf_siginfo
{ {
int si_signo; /* signal number */ int si_signo; /* signal number */
...@@ -157,7 +203,6 @@ elf_core_copy_task_fpregs(struct task_struct *tsk, elf_fpregset_t *fpu) ...@@ -157,7 +203,6 @@ elf_core_copy_task_fpregs(struct task_struct *tsk, elf_fpregset_t *fpu)
struct _fpstate_ia32 *fpstate = (void*)fpu; struct _fpstate_ia32 *fpstate = (void*)fpu;
struct pt_regs *regs = (struct pt_regs *)(tsk->thread.rsp0); struct pt_regs *regs = (struct pt_regs *)(tsk->thread.rsp0);
mm_segment_t oldfs = get_fs(); mm_segment_t oldfs = get_fs();
int ret;
if (!tsk->used_math) if (!tsk->used_math)
return 0; return 0;
...@@ -165,12 +210,12 @@ elf_core_copy_task_fpregs(struct task_struct *tsk, elf_fpregset_t *fpu) ...@@ -165,12 +210,12 @@ elf_core_copy_task_fpregs(struct task_struct *tsk, elf_fpregset_t *fpu)
if (tsk == current) if (tsk == current)
unlazy_fpu(tsk); unlazy_fpu(tsk);
set_fs(KERNEL_DS); set_fs(KERNEL_DS);
ret = save_i387_ia32(tsk, fpstate, regs, 1); save_i387_ia32(tsk, fpstate, regs, 1);
/* Correct for i386 bug. It puts the fop into the upper 16bits of /* Correct for i386 bug. It puts the fop into the upper 16bits of
the tag word (like FXSAVE), not into the fcs*/ the tag word (like FXSAVE), not into the fcs*/
fpstate->cssel |= fpstate->tag & 0xffff0000; fpstate->cssel |= fpstate->tag & 0xffff0000;
set_fs(oldfs); set_fs(oldfs);
return ret; return 1;
} }
#define ELF_CORE_COPY_XFPREGS 1 #define ELF_CORE_COPY_XFPREGS 1
...@@ -302,8 +347,9 @@ int setup_arg_pages(struct linux_binprm *bprm) ...@@ -302,8 +347,9 @@ int setup_arg_pages(struct linux_binprm *bprm)
mpnt->vm_mm = mm; mpnt->vm_mm = mm;
mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
mpnt->vm_end = IA32_STACK_TOP; mpnt->vm_end = IA32_STACK_TOP;
mpnt->vm_page_prot = PAGE_COPY_EXEC; mpnt->vm_flags = vm_stack_flags32;
mpnt->vm_flags = VM_STACK_FLAGS; mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ?
PAGE_COPY_EXEC : PAGE_COPY;
mpnt->vm_ops = NULL; mpnt->vm_ops = NULL;
mpnt->vm_pgoff = 0; mpnt->vm_pgoff = 0;
mpnt->vm_file = NULL; mpnt->vm_file = NULL;
...@@ -333,7 +379,7 @@ elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int p ...@@ -333,7 +379,7 @@ elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int p
struct task_struct *me = current; struct task_struct *me = current;
if (prot & PROT_READ) if (prot & PROT_READ)
prot |= PROT_EXEC; prot |= vm_force_exec32;
down_write(&me->mm->mmap_sem); down_write(&me->mm->mmap_sem);
map_addr = do_mmap(filep, ELF_PAGESTART(addr), map_addr = do_mmap(filep, ELF_PAGESTART(addr),
......
This diff is collapsed.
...@@ -33,6 +33,7 @@ ...@@ -33,6 +33,7 @@
#include <asm/sigcontext32.h> #include <asm/sigcontext32.h>
#include <asm/fpu32.h> #include <asm/fpu32.h>
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/vsyscall32.h>
#define ptr_to_u32(x) ((u32)(u64)(x)) /* avoid gcc warning */ #define ptr_to_u32(x) ((u32)(u64)(x)) /* avoid gcc warning */
...@@ -428,7 +429,7 @@ void ia32_setup_frame(int sig, struct k_sigaction *ka, ...@@ -428,7 +429,7 @@ void ia32_setup_frame(int sig, struct k_sigaction *ka,
/* Return stub is in 32bit vsyscall page */ /* Return stub is in 32bit vsyscall page */
{ {
void *restorer = syscall32_page + 32; void *restorer = VSYSCALL32_SIGRETURN;
if (ka->sa.sa_flags & SA_RESTORER) if (ka->sa.sa_flags & SA_RESTORER)
restorer = ka->sa.sa_restorer; restorer = ka->sa.sa_restorer;
err |= __put_user(ptr_to_u32(restorer), &frame->pretcode); err |= __put_user(ptr_to_u32(restorer), &frame->pretcode);
...@@ -521,7 +522,7 @@ void ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info, ...@@ -521,7 +522,7 @@ void ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
{ {
void *restorer = syscall32_page + 32; void *restorer = VSYSCALL32_RTSIGRETURN;
if (ka->sa.sa_flags & SA_RESTORER) if (ka->sa.sa_flags & SA_RESTORER)
restorer = ka->sa.sa_restorer; restorer = ka->sa.sa_restorer;
err |= __put_user(ptr_to_u32(restorer), &frame->pretcode); err |= __put_user(ptr_to_u32(restorer), &frame->pretcode);
......
...@@ -182,9 +182,9 @@ quiet_ni_syscall: ...@@ -182,9 +182,9 @@ quiet_ni_syscall:
PTREGSCALL stub32_sigaltstack, sys32_sigaltstack PTREGSCALL stub32_sigaltstack, sys32_sigaltstack
PTREGSCALL stub32_sigsuspend, sys32_sigsuspend PTREGSCALL stub32_sigsuspend, sys32_sigsuspend
PTREGSCALL stub32_execve, sys32_execve PTREGSCALL stub32_execve, sys32_execve
PTREGSCALL stub32_fork, sys32_fork PTREGSCALL stub32_fork, sys_fork
PTREGSCALL stub32_clone, sys32_clone PTREGSCALL stub32_clone, sys32_clone
PTREGSCALL stub32_vfork, sys32_vfork PTREGSCALL stub32_vfork, sys_vfork
PTREGSCALL stub32_iopl, sys_iopl PTREGSCALL stub32_iopl, sys_iopl
PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend
......
...@@ -78,12 +78,24 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 val) ...@@ -78,12 +78,24 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 val)
case offsetof(struct user32, u_debugreg[5]): case offsetof(struct user32, u_debugreg[5]):
return -EIO; return -EIO;
case offsetof(struct user32, u_debugreg[0]) ... case offsetof(struct user32, u_debugreg[0]):
offsetof(struct user32, u_debugreg[3]): child->thread.debugreg0 = val;
break;
case offsetof(struct user32, u_debugreg[1]):
child->thread.debugreg1 = val;
break;
case offsetof(struct user32, u_debugreg[2]):
child->thread.debugreg2 = val;
break;
case offsetof(struct user32, u_debugreg[3]):
child->thread.debugreg3 = val;
break;
case offsetof(struct user32, u_debugreg[6]): case offsetof(struct user32, u_debugreg[6]):
child->thread.debugreg child->thread.debugreg6 = val;
[(regno-offsetof(struct user32, u_debugreg[0]))/4]
= val;
break; break;
case offsetof(struct user32, u_debugreg[7]): case offsetof(struct user32, u_debugreg[7]):
...@@ -92,7 +104,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 val) ...@@ -92,7 +104,7 @@ static int putreg32(struct task_struct *child, unsigned regno, u32 val)
for(i=0; i<4; i++) for(i=0; i<4; i++)
if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1) if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1)
return -EIO; return -EIO;
child->thread.debugreg[7] = val; child->thread.debugreg7 = val;
break; break;
default: default:
...@@ -142,8 +154,23 @@ static int getreg32(struct task_struct *child, unsigned regno, u32 *val) ...@@ -142,8 +154,23 @@ static int getreg32(struct task_struct *child, unsigned regno, u32 *val)
R32(eflags, eflags); R32(eflags, eflags);
R32(esp, rsp); R32(esp, rsp);
case offsetof(struct user32, u_debugreg[0]) ... offsetof(struct user32, u_debugreg[7]): case offsetof(struct user32, u_debugreg[0]):
*val = child->thread.debugreg[(regno-offsetof(struct user32, u_debugreg[0]))/4]; *val = child->thread.debugreg0;
break;
case offsetof(struct user32, u_debugreg[1]):
*val = child->thread.debugreg1;
break;
case offsetof(struct user32, u_debugreg[2]):
*val = child->thread.debugreg2;
break;
case offsetof(struct user32, u_debugreg[3]):
*val = child->thread.debugreg3;
break;
case offsetof(struct user32, u_debugreg[6]):
*val = child->thread.debugreg6;
break;
case offsetof(struct user32, u_debugreg[7]):
*val = child->thread.debugreg7;
break; break;
default: default:
......
...@@ -234,7 +234,7 @@ sys32_mmap(struct mmap_arg_struct *arg) ...@@ -234,7 +234,7 @@ sys32_mmap(struct mmap_arg_struct *arg)
} }
if (a.prot & PROT_READ) if (a.prot & PROT_READ)
a.prot |= PROT_EXEC; a.prot |= vm_force_exec32;
mm = current->mm; mm = current->mm;
down_write(&mm->mmap_sem); down_write(&mm->mmap_sem);
...@@ -253,7 +253,7 @@ asmlinkage long ...@@ -253,7 +253,7 @@ asmlinkage long
sys32_mprotect(unsigned long start, size_t len, unsigned long prot) sys32_mprotect(unsigned long start, size_t len, unsigned long prot)
{ {
if (prot & PROT_READ) if (prot & PROT_READ)
prot |= PROT_EXEC; prot |= vm_force_exec32;
return sys_mprotect(start,len,prot); return sys_mprotect(start,len,prot);
} }
...@@ -929,7 +929,11 @@ struct sysinfo32 { ...@@ -929,7 +929,11 @@ struct sysinfo32 {
u32 totalswap; u32 totalswap;
u32 freeswap; u32 freeswap;
unsigned short procs; unsigned short procs;
char _f[22]; unsigned short pad;
u32 totalhigh;
u32 freehigh;
u32 mem_unit;
char _f[20-2*sizeof(u32)-sizeof(int)];
}; };
extern asmlinkage long sys_sysinfo(struct sysinfo *info); extern asmlinkage long sys_sysinfo(struct sysinfo *info);
...@@ -955,7 +959,10 @@ sys32_sysinfo(struct sysinfo32 *info) ...@@ -955,7 +959,10 @@ sys32_sysinfo(struct sysinfo32 *info)
__put_user (s.bufferram, &info->bufferram) || __put_user (s.bufferram, &info->bufferram) ||
__put_user (s.totalswap, &info->totalswap) || __put_user (s.totalswap, &info->totalswap) ||
__put_user (s.freeswap, &info->freeswap) || __put_user (s.freeswap, &info->freeswap) ||
__put_user (s.procs, &info->procs)) __put_user (s.procs, &info->procs) ||
__put_user (s.totalhigh, &info->totalhigh) ||
__put_user (s.freehigh, &info->freehigh) ||
__put_user (s.mem_unit, &info->mem_unit))
return -EFAULT; return -EFAULT;
return 0; return 0;
} }
...@@ -1419,7 +1426,7 @@ asmlinkage long sys32_mmap2(unsigned long addr, unsigned long len, ...@@ -1419,7 +1426,7 @@ asmlinkage long sys32_mmap2(unsigned long addr, unsigned long len,
} }
if (prot & PROT_READ) if (prot & PROT_READ)
prot |= PROT_EXEC; prot |= vm_force_exec32;
down_write(&mm->mmap_sem); down_write(&mm->mmap_sem);
error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff); error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
...@@ -1587,40 +1594,14 @@ asmlinkage long sys32_execve(char *name, u32 argv, u32 envp, struct pt_regs regs ...@@ -1587,40 +1594,14 @@ asmlinkage long sys32_execve(char *name, u32 argv, u32 envp, struct pt_regs regs
return ret; return ret;
} }
asmlinkage long sys32_fork(struct pt_regs regs)
{
struct task_struct *p;
p = do_fork(SIGCHLD, regs.rsp, &regs, 0, NULL, NULL);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
}
asmlinkage long sys32_clone(unsigned int clone_flags, unsigned int newsp, struct pt_regs regs) asmlinkage long sys32_clone(unsigned int clone_flags, unsigned int newsp, struct pt_regs regs)
{ {
struct task_struct *p;
void *parent_tid = (void *)regs.rdx; void *parent_tid = (void *)regs.rdx;
void *child_tid = (void *)regs.rdi; void *child_tid = (void *)regs.rdi;
if (!newsp) if (!newsp)
newsp = regs.rsp; newsp = regs.rsp;
p = do_fork(clone_flags & ~CLONE_IDLETASK, newsp, &regs, 0, return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, &regs, 0,
parent_tid, child_tid); parent_tid, child_tid);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
}
/*
* This is trivial, and on the face of it looks like it
* could equally well be done in user mode.
*
* Not so, for quite unobvious reasons - register pressure.
* In user mode vfork() cannot have a stack frame, and if
* done by calling the "clone()" system call directly, you
* do not have enough call-clobbered registers to hold all
* the information you need.
*/
asmlinkage long sys32_vfork(struct pt_regs regs)
{
struct task_struct *p;
p = do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.rsp, &regs, 0, NULL, NULL);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
} }
/* /*
......
...@@ -13,33 +13,14 @@ ...@@ -13,33 +13,14 @@
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/ia32_unistd.h> #include <asm/ia32_unistd.h>
/* 32bit SYSCALL stub mapped into user space. */ /* 32bit VDSO mapped into user space. */
asm(" .code32\n" asm(".section \".init.data\",\"aw\"\n"
"\nsyscall32:\n" "syscall32:\n"
" pushl %ebp\n" ".incbin \"arch/x86_64/ia32/vsyscall.so\"\n"
" movl %ecx,%ebp\n"
" syscall\n"
" popl %ebp\n"
" ret\n"
"syscall32_end:\n" "syscall32_end:\n"
".previous");
/* signal trampolines */
"sig32_rt_tramp:\n"
" movl $" __stringify(__NR_ia32_rt_sigreturn) ",%eax\n"
" syscall\n"
"sig32_rt_tramp_end:\n"
"sig32_tramp:\n"
" popl %eax\n"
" movl $" __stringify(__NR_ia32_sigreturn) ",%eax\n"
" syscall\n"
"sig32_tramp_end:\n"
" .code64\n");
extern unsigned char syscall32[], syscall32_end[]; extern unsigned char syscall32[], syscall32_end[];
extern unsigned char sig32_rt_tramp[], sig32_rt_tramp_end[];
extern unsigned char sig32_tramp[], sig32_tramp_end[];
char *syscall32_page; char *syscall32_page;
...@@ -76,10 +57,6 @@ static int __init init_syscall32(void) ...@@ -76,10 +57,6 @@ static int __init init_syscall32(void)
panic("Cannot allocate syscall32 page"); panic("Cannot allocate syscall32 page");
SetPageReserved(virt_to_page(syscall32_page)); SetPageReserved(virt_to_page(syscall32_page));
memcpy(syscall32_page, syscall32, syscall32_end - syscall32); memcpy(syscall32_page, syscall32, syscall32_end - syscall32);
memcpy(syscall32_page + 32, sig32_rt_tramp,
sig32_rt_tramp_end - sig32_rt_tramp);
memcpy(syscall32_page + 64, sig32_tramp,
sig32_tramp_end - sig32_tramp);
return 0; return 0;
} }
......
/*
* Code for the vsyscall page. This version uses the syscall instruction.
*/
#include <asm/ia32_unistd.h>
#include <asm/offset.h>
.text
.section .text.vsyscall,"ax"
.globl __kernel_vsyscall
.type __kernel_vsyscall,@function
__kernel_vsyscall:
.LSTART_vsyscall:
push %ebp
.Lpush_ebp:
movl %ecx, %ebp
syscall
popl %ebp
.Lpop_ebp:
ret
.LEND_vsyscall:
.size __kernel_vsyscall,.-.LSTART_vsyscall
.section .text.sigreturn,"ax"
.balign 32
.globl __kernel_sigreturn
.type __kernel_sigreturn,@function
__kernel_sigreturn:
.LSTART_sigreturn:
popl %eax
movl $__NR_ia32_sigreturn, %eax
syscall
.LEND_sigreturn:
.size __kernel_sigreturn,.-.LSTART_sigreturn
.section .text.rtsigreturn,"ax"
.balign 32
.globl __kernel_rt_sigreturn,"ax"
.type __kernel_rt_sigreturn,@function
__kernel_rt_sigreturn:
.LSTART_rt_sigreturn:
movl $__NR_ia32_rt_sigreturn, %eax
syscall
.LEND_rt_sigreturn:
.size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
.section .eh_frame,"a",@progbits
.LSTARTFRAME:
.long .LENDCIE-.LSTARTCIE
.LSTARTCIE:
.long 0 /* CIE ID */
.byte 1 /* Version number */
.string "zR" /* NUL-terminated augmentation string */
.uleb128 1 /* Code alignment factor */
.sleb128 -4 /* Data alignment factor */
.byte 8 /* Return address register column */
.uleb128 1 /* Augmentation value length */
.byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
.byte 0x0c /* DW_CFA_def_cfa */
.uleb128 4
.uleb128 4
.byte 0x88 /* DW_CFA_offset, column 0x8 */
.uleb128 1
.align 4
.LENDCIE:
.long .LENDFDE1-.LSTARTFDE1 /* Length FDE */
.LSTARTFDE1:
.long .LSTARTFDE1-.LSTARTFRAME /* CIE pointer */
.long .LSTART_vsyscall-. /* PC-relative start address */
.long .LEND_vsyscall-.LSTART_vsyscall
.uleb128 0 /* Augmentation length */
/* What follows are the instructions for the table generation.
We have to record all changes of the stack pointer. */
.byte 0x40 + .Lpush_ebp-.LSTART_vsyscall /* DW_CFA_advance_loc */
.byte 0x0e /* DW_CFA_def_cfa_offset */
.uleb128 8
.byte 0x85, 0x02 /* DW_CFA_offset %ebp -8 */
.byte 0x40 + .Lpop_ebp-.Lpush_ebp /* DW_CFA_advance_loc */
.byte 0xc5 /* DW_CFA_restore %ebp */
.byte 0x0e /* DW_CFA_def_cfa_offset */
.uleb128 4
.align 4
.LENDFDE1:
.long .LENDFDE2-.LSTARTFDE2 /* Length FDE */
.LSTARTFDE2:
.long .LSTARTFDE2-.LSTARTFRAME /* CIE pointer */
/* HACK: The dwarf2 unwind routines will subtract 1 from the
return address to get an address in the middle of the
presumed call instruction. Since we didn't get here via
a call, we need to include the nop before the real start
to make up for it. */
.long .LSTART_sigreturn-1-. /* PC-relative start address */
.long .LEND_sigreturn-.LSTART_sigreturn+1
.uleb128 0 /* Augmentation length */
/* What follows are the instructions for the table generation.
We record the locations of each register saved. This is
complicated by the fact that the "CFA" is always assumed to
be the value of the stack pointer in the caller. This means
that we must define the CFA of this body of code to be the
saved value of the stack pointer in the sigcontext. Which
also means that there is no fixed relation to the other
saved registers, which means that we must use DW_CFA_expression
to compute their addresses. It also means that when we
adjust the stack with the popl, we have to do it all over again. */
#define do_cfa_expr(offset) \
.byte 0x0f; /* DW_CFA_def_cfa_expression */ \
.uleb128 1f-0f; /* length */ \
0: .byte 0x74; /* DW_OP_breg4 */ \
.sleb128 offset; /* offset */ \
.byte 0x06; /* DW_OP_deref */ \
1:
#define do_expr(regno, offset) \
.byte 0x10; /* DW_CFA_expression */ \
.uleb128 regno; /* regno */ \
.uleb128 1f-0f; /* length */ \
0: .byte 0x74; /* DW_OP_breg4 */ \
.sleb128 offset; /* offset */ \
1:
do_cfa_expr(IA32_SIGCONTEXT_esp+4)
do_expr(0, IA32_SIGCONTEXT_eax+4)
do_expr(1, IA32_SIGCONTEXT_ecx+4)
do_expr(2, IA32_SIGCONTEXT_edx+4)
do_expr(3, IA32_SIGCONTEXT_ebx+4)
do_expr(5, IA32_SIGCONTEXT_ebp+4)
do_expr(6, IA32_SIGCONTEXT_esi+4)
do_expr(7, IA32_SIGCONTEXT_edi+4)
do_expr(8, IA32_SIGCONTEXT_eip+4)
.byte 0x42 /* DW_CFA_advance_loc 2 -- nop; popl eax. */
do_cfa_expr(IA32_SIGCONTEXT_esp)
do_expr(0, IA32_SIGCONTEXT_eax)
do_expr(1, IA32_SIGCONTEXT_ecx)
do_expr(2, IA32_SIGCONTEXT_edx)
do_expr(3, IA32_SIGCONTEXT_ebx)
do_expr(5, IA32_SIGCONTEXT_ebp)
do_expr(6, IA32_SIGCONTEXT_esi)
do_expr(7, IA32_SIGCONTEXT_edi)
do_expr(8, IA32_SIGCONTEXT_eip)
.align 4
.LENDFDE2:
.long .LENDFDE3-.LSTARTFDE3 /* Length FDE */
.LSTARTFDE3:
.long .LSTARTFDE3-.LSTARTFRAME /* CIE pointer */
/* HACK: See above wrt unwind library assumptions. */
.long .LSTART_rt_sigreturn-1-. /* PC-relative start address */
.long .LEND_rt_sigreturn-.LSTART_rt_sigreturn+1
.uleb128 0 /* Augmentation */
/* What follows are the instructions for the table generation.
We record the locations of each register saved. This is
slightly less complicated than the above, since we don't
modify the stack pointer in the process. */
do_cfa_expr(IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_esp)
do_expr(0, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_eax)
do_expr(1, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ecx)
do_expr(2, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_edx)
do_expr(3, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ebx)
do_expr(5, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ebp)
do_expr(6, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_esi)
do_expr(7, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_edi)
do_expr(8, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_eip)
.align 4
.LENDFDE3:
/*
* Linker script for vsyscall DSO. The vsyscall page is an ELF shared
* object prelinked to its virtual address. This script controls its layout.
*/
/* This must match <asm/fixmap.h>. */
VSYSCALL_BASE = 0xffffe000;
SECTIONS
{
. = VSYSCALL_BASE + SIZEOF_HEADERS;
.hash : { *(.hash) } :text
.dynsym : { *(.dynsym) }
.dynstr : { *(.dynstr) }
.gnu.version : { *(.gnu.version) }
.gnu.version_d : { *(.gnu.version_d) }
.gnu.version_r : { *(.gnu.version_r) }
/* This linker script is used both with -r and with -shared.
For the layouts to match, we need to skip more than enough
space for the dynamic symbol table et al. If this amount
is insufficient, ld -shared will barf. Just increase it here. */
. = VSYSCALL_BASE + 0x400;
.text.vsyscall : { *(.text.vsyscall) } :text =0x90909090
/* This is an 32bit object and we cannot easily get the offsets
into the 64bit kernel. Just hardcode them here. This assumes
that all the stubs don't need more than 0x100 bytes. */
. = VSYSCALL_BASE + 0x500;
.text.sigreturn : { *(.text.sigreturn) } :text =0x90909090
. = VSYSCALL_BASE + 0x600;
.text.rtsigreturn : { *(.text.rtsigreturn) } :text =0x90909090
.eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr
.eh_frame : { KEEP (*(.eh_frame)) } :text
.dynamic : { *(.dynamic) } :text :dynamic
.useless : {
*(.got.plt) *(.got)
*(.data .data.* .gnu.linkonce.d.*)
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
} :text
}
/*
* We must supply the ELF program headers explicitly to get just one
* PT_LOAD segment, and set the flags explicitly to make segments read-only.
*/
PHDRS
{
text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
}
/*
* This controls what symbols we export from the DSO.
*/
VERSION
{
LINUX_2.5 {
global:
__kernel_vsyscall;
__kernel_sigreturn;
__kernel_rt_sigreturn;
local: *;
};
}
/* The ELF entry point can be used to set the AT_SYSINFO value. */
ENTRY(__kernel_vsyscall);
...@@ -7,7 +7,7 @@ EXTRA_AFLAGS := -traditional ...@@ -7,7 +7,7 @@ EXTRA_AFLAGS := -traditional
obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o \ obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o \
ptrace.o i8259.o ioport.o ldt.o setup.o time.o sys_x86_64.o \ ptrace.o i8259.o ioport.o ldt.o setup.o time.o sys_x86_64.o \
pci-dma.o x8664_ksyms.o i387.o syscall.o vsyscall.o \ pci-dma.o x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o bluesmoke.o bootflag.o e820.o reboot.o setup64.o bluesmoke.o bootflag.o e820.o reboot.o warmreboot.o
obj-$(CONFIG_MTRR) += mtrr/ obj-$(CONFIG_MTRR) += mtrr/
obj-$(CONFIG_ACPI) += acpi/ obj-$(CONFIG_ACPI) += acpi/
......
This diff is collapsed.
...@@ -94,12 +94,6 @@ wakeup_32: ...@@ -94,12 +94,6 @@ wakeup_32:
movw %ax, %ss movw %ax, %ss
mov $(wakeup_stack - __START_KERNEL_map), %esp mov $(wakeup_stack - __START_KERNEL_map), %esp
call 1f
1: popl %eax
movl $0xb8040, %ebx
call early_print
movl saved_magic - __START_KERNEL_map, %eax movl saved_magic - __START_KERNEL_map, %eax
cmpl $0x9abcdef0, %eax cmpl $0x9abcdef0, %eax
jne bogus_32_magic jne bogus_32_magic
...@@ -115,11 +109,7 @@ wakeup_32: ...@@ -115,11 +109,7 @@ wakeup_32:
movl %eax, %cr4 movl %eax, %cr4
/* Setup early boot stage 4 level pagetables */ /* Setup early boot stage 4 level pagetables */
#if 1
movl $(wakeup_level4_pgt - __START_KERNEL_map), %eax movl $(wakeup_level4_pgt - __START_KERNEL_map), %eax
#else
movl saved_cr3 - __START_KERNEL_map, %eax
#endif
movl %eax, %cr3 movl %eax, %cr3
/* Setup EFER (Extended Feature Enable Register) */ /* Setup EFER (Extended Feature Enable Register) */
...@@ -223,19 +213,6 @@ wakeup_long64: ...@@ -223,19 +213,6 @@ wakeup_long64:
.code32 .code32
early_print:
movl $16, %edx
1:
movl %eax, %ecx
andl $0xf, %ecx
shrl $4, %eax
addw $0x0e00 + '0', %ecx
movw %ecx, %ds:(%edx, %ebx)
decl %edx
decl %edx
jnz 1b
ret
.align 64 .align 64
gdta: gdta:
.word 0, 0, 0, 0 # dummy .word 0, 0, 0, 0 # dummy
......
...@@ -385,10 +385,10 @@ void __init setup_local_APIC (void) ...@@ -385,10 +385,10 @@ void __init setup_local_APIC (void)
value = apic_read(APIC_LVT0) & APIC_LVT_MASKED; value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
if (!smp_processor_id() && (pic_mode || !value)) { if (!smp_processor_id() && (pic_mode || !value)) {
value = APIC_DM_EXTINT; value = APIC_DM_EXTINT;
printk("enabled ExtINT on CPU#%d\n", smp_processor_id()); Dprintk(KERN_INFO "enabled ExtINT on CPU#%d\n", smp_processor_id());
} else { } else {
value = APIC_DM_EXTINT | APIC_LVT_MASKED; value = APIC_DM_EXTINT | APIC_LVT_MASKED;
printk("masked ExtINT on CPU#%d\n", smp_processor_id()); Dprintk(KERN_INFO "masked ExtINT on CPU#%d\n", smp_processor_id());
} }
apic_write_around(APIC_LVT0, value); apic_write_around(APIC_LVT0, value);
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/segment.h> #include <asm/segment.h>
#include <asm/thread_info.h> #include <asm/thread_info.h>
#include <asm/ia32.h>
#define DEFINE(sym, val) \ #define DEFINE(sym, val) \
asm volatile("\n->" #sym " %0 " #val : : "i" (val)) asm volatile("\n->" #sym " %0 " #val : : "i" (val))
...@@ -43,5 +44,21 @@ int main(void) ...@@ -43,5 +44,21 @@ int main(void)
ENTRY(irqstackptr); ENTRY(irqstackptr);
BLANK(); BLANK();
#undef ENTRY #undef ENTRY
#define ENTRY(entry) DEFINE(IA32_SIGCONTEXT_ ## entry, offsetof(struct sigcontext_ia32, entry))
ENTRY(eax);
ENTRY(ebx);
ENTRY(ecx);
ENTRY(edx);
ENTRY(esi);
ENTRY(edi);
ENTRY(ebp);
ENTRY(esp);
ENTRY(eip);
BLANK();
#undef ENTRY
DEFINE(IA32_RT_SIGFRAME_sigcontext,
offsetof (struct rt_sigframe32, uc.uc_mcontext));
BLANK();
return 0; return 0;
} }
...@@ -129,12 +129,75 @@ static struct pci_dev *find_k8_nb(void) ...@@ -129,12 +129,75 @@ static struct pci_dev *find_k8_nb(void)
int cpu = smp_processor_id(); int cpu = smp_processor_id();
pci_for_each_dev(dev) { pci_for_each_dev(dev) {
if (dev->bus->number==0 && PCI_FUNC(dev->devfn)==3 && if (dev->bus->number==0 && PCI_FUNC(dev->devfn)==3 &&
PCI_SLOT(dev->devfn) == (24+cpu)) PCI_SLOT(dev->devfn) == (24U+cpu))
return dev; return dev;
} }
return NULL; return NULL;
} }
/* When we have kallsyms we can afford kmcedecode too. */
static char *transaction[] = {
"instruction", "data", "generic", "reserved"
};
static char *cachelevel[] = {
"level 0", "level 1", "level 2", "level generic"
};
static char *memtrans[] = {
"generic error", "generic read", "generic write", "data read",
"data write", "instruction fetch", "prefetch", "snoop",
"?", "?", "?", "?", "?", "?", "?"
};
static char *partproc[] = {
"local node origin", "local node response",
"local node observed", "generic"
};
static char *timeout[] = {
"request didn't time out",
"request timed out"
};
static char *memoryio[] = {
"memory access", "res.", "i/o access", "generic"
};
static char *extendederr[] = {
"ecc error",
"crc error",
"sync error",
"mst abort",
"tgt abort",
"gart error",
"rmw error",
"wdog error",
"chipkill ecc error",
"<9>","<10>","<11>","<12>",
"<13>","<14>","<15>"
};
static char *highbits[32] = {
[31] = "previous error lost",
[30] = "error overflow",
[29] = "error uncorrected",
[28] = "error enable",
[27] = "misc error valid",
[26] = "error address valid",
[25] = "processor context corrupt",
[24] = "res24",
[23] = "res23",
/* 22-15 ecc syndrome bits */
[14] = "corrected ecc error",
[13] = "uncorrected ecc error",
[12] = "res12",
[11] = "res11",
[10] = "res10",
[9] = "res9",
[8] = "dram scrub error",
[7] = "res7",
/* 6-4 ht link number of error */
[3] = "res3",
[2] = "res2",
[1] = "err cpu0",
[0] = "err cpu1",
};
static void check_k8_nb(void) static void check_k8_nb(void)
{ {
struct pci_dev *nb; struct pci_dev *nb;
...@@ -149,20 +212,52 @@ static void check_k8_nb(void) ...@@ -149,20 +212,52 @@ static void check_k8_nb(void)
return; return;
printk(KERN_ERR "Northbridge status %08x%08x\n", printk(KERN_ERR "Northbridge status %08x%08x\n",
statushigh,statuslow); statushigh,statuslow);
if (statuslow & 0x10)
printk(KERN_ERR "GART error %d\n", statuslow & 0xf); unsigned short errcode = statuslow & 0xffff;
if (statushigh & (1<<31)) switch (errcode >> 8) {
printk(KERN_ERR "Lost an northbridge error\n"); case 0:
if (statushigh & (1<<25)) printk(KERN_ERR " GART TLB error %s %s\n",
printk(KERN_EMERG "NB status: unrecoverable\n"); transaction[(errcode >> 2) & 3],
cachelevel[errcode & 3]);
break;
case 1:
if (errcode & (1<<11)) {
printk(KERN_ERR " bus error %s %s %s %s %s\n",
partproc[(errcode >> 10) & 0x3],
timeout[(errcode >> 9) & 1],
memtrans[(errcode >> 4) & 0xf],
memoryio[(errcode >> 2) & 0x3],
cachelevel[(errcode & 0x3)]);
} else if (errcode & (1<<8)) {
printk(KERN_ERR " memory error %s %s %s\n",
memtrans[(errcode >> 4) & 0xf],
transaction[(errcode >> 2) & 0x3],
cachelevel[(errcode & 0x3)]);
} else {
printk(KERN_ERR " unknown error code %x\n", errcode);
}
break;
}
if (statushigh & ((1<<14)|(1<<13)))
printk(KERN_ERR " ECC syndrome bits %x\n",
(((statuslow >> 24) & 0xff) << 8) | ((statushigh >> 15) & 0x7f));
errcode = (statuslow >> 16) & 0xf;
printk(KERN_ERR " extended error %s\n", extendederr[(statuslow >> 16) & 0xf]);
/* should only print when it was a HyperTransport related error. */
printk(KERN_ERR " link number %x\n", (statushigh >> 4) & 3);
int i;
for (i = 0; i < 32; i++)
if (highbits[i] && (statushigh & (1<<i)))
printk(KERN_ERR " %s\n", highbits[i]);
if (statushigh & (1<<26)) { if (statushigh & (1<<26)) {
u32 addrhigh, addrlow; u32 addrhigh, addrlow;
pci_read_config_dword(nb, 0x54, &addrhigh); pci_read_config_dword(nb, 0x54, &addrhigh);
pci_read_config_dword(nb, 0x50, &addrlow); pci_read_config_dword(nb, 0x50, &addrlow);
printk(KERN_ERR "NB error address %08x%08x\n", addrhigh,addrlow); printk(KERN_ERR " error address %08x%08x\n", addrhigh,addrlow);
} }
if (statushigh & (1<<29))
printk(KERN_EMERG "Error uncorrected\n");
statushigh &= ~(1<<31); statushigh &= ~(1<<31);
pci_write_config_dword(nb, 0x4c, statushigh); pci_write_config_dword(nb, 0x4c, statushigh);
} }
......
...@@ -75,7 +75,7 @@ static inline int bad_addr(unsigned long *addrp, unsigned long size) ...@@ -75,7 +75,7 @@ static inline int bad_addr(unsigned long *addrp, unsigned long size)
return 0; return 0;
} }
int __init e820_mapped(unsigned long start, unsigned long end, int type) int __init e820_mapped(unsigned long start, unsigned long end, unsigned type)
{ {
int i; int i;
for (i = 0; i < e820.nr_map; i++) { for (i = 0; i < e820.nr_map; i++) {
......
...@@ -97,8 +97,8 @@ ...@@ -97,8 +97,8 @@
/* /*
* A newly forked process directly context switches into this. * A newly forked process directly context switches into this.
*/ */
/* rdi: prev */
ENTRY(ret_from_fork) ENTRY(ret_from_fork)
movq %rax,%rdi /* prev task, returned by __switch_to -> arg1 */
call schedule_tail call schedule_tail
GET_THREAD_INFO(%rcx) GET_THREAD_INFO(%rcx)
bt $TIF_SYSCALL_TRACE,threadinfo_flags(%rcx) bt $TIF_SYSCALL_TRACE,threadinfo_flags(%rcx)
...@@ -414,6 +414,7 @@ iret_label: ...@@ -414,6 +414,7 @@ iret_label:
.previous .previous
.section .fixup,"ax" .section .fixup,"ax"
/* force a signal here? this matches i386 behaviour */ /* force a signal here? this matches i386 behaviour */
/* running with kernel gs */
bad_iret: bad_iret:
movq $-9999,%rdi /* better code? */ movq $-9999,%rdi /* better code? */
jmp do_exit jmp do_exit
...@@ -519,21 +520,27 @@ ENTRY(spurious_interrupt) ...@@ -519,21 +520,27 @@ ENTRY(spurious_interrupt)
*/ */
ENTRY(error_entry) ENTRY(error_entry)
/* rdi slot contains rax, oldrax contains error code */ /* rdi slot contains rax, oldrax contains error code */
pushq %rsi
movq 8(%rsp),%rsi /* load rax */
pushq %rdx
pushq %rcx
pushq %rsi /* store rax */
pushq %r8
pushq %r9
pushq %r10
pushq %r11
cld cld
SAVE_REST subq $14*8,%rsp
movq %rsi,13*8(%rsp)
movq 14*8(%rsp),%rsi /* load rax from rdi slot */
movq %rdx,12*8(%rsp)
movq %rcx,11*8(%rsp)
movq %rsi,10*8(%rsp) /* store rax */
movq %r8, 9*8(%rsp)
movq %r9, 8*8(%rsp)
movq %r10,7*8(%rsp)
movq %r11,6*8(%rsp)
movq %rbx,5*8(%rsp)
movq %rbp,4*8(%rsp)
movq %r12,3*8(%rsp)
movq %r13,2*8(%rsp)
movq %r14,1*8(%rsp)
movq %r15,(%rsp)
xorl %ebx,%ebx
testl $3,CS(%rsp) testl $3,CS(%rsp)
je error_kernelspace je error_kernelspace
error_swapgs: error_swapgs:
xorl %ebx,%ebx
swapgs swapgs
error_sti: error_sti:
movq %rdi,RDI(%rsp) movq %rdi,RDI(%rsp)
...@@ -557,13 +564,14 @@ error_exit: ...@@ -557,13 +564,14 @@ error_exit:
iretq iretq
error_kernelspace: error_kernelspace:
incl %ebx
/* There are two places in the kernel that can potentially fault with /* There are two places in the kernel that can potentially fault with
usergs. Handle them here. */ usergs. Handle them here. The exception handlers after
iret run with kernel gs again, so don't set the user space flag. */
cmpq $iret_label,RIP(%rsp) cmpq $iret_label,RIP(%rsp)
je error_swapgs je error_swapgs
cmpq $gs_change,RIP(%rsp) cmpq $gs_change,RIP(%rsp)
je error_swapgs je error_swapgs
movl $1,%ebx
jmp error_sti jmp error_sti
/* Reload gs selector with exception handling */ /* Reload gs selector with exception handling */
...@@ -584,7 +592,9 @@ gs_change: ...@@ -584,7 +592,9 @@ gs_change:
.quad gs_change,bad_gs .quad gs_change,bad_gs
.previous .previous
.section .fixup,"ax" .section .fixup,"ax"
/* running with kernelgs */
bad_gs: bad_gs:
swapgs /* switch back to user gs */
xorl %eax,%eax xorl %eax,%eax
movl %eax,%gs movl %eax,%gs
jmp 2b jmp 2b
...@@ -614,12 +624,8 @@ ENTRY(kernel_thread) ...@@ -614,12 +624,8 @@ ENTRY(kernel_thread)
# clone now # clone now
call do_fork call do_fork
movq %rax,RAX(%rsp)
xorl %edi,%edi xorl %edi,%edi
cmpq $-1000,%rax
jnb 1f
movl tsk_pid(%rax),%eax
1: movq %rax,RAX(%rsp)
/* /*
* It isn't worth to check for reschedule here, * It isn't worth to check for reschedule here,
......
...@@ -351,7 +351,7 @@ gdt32_end: ...@@ -351,7 +351,7 @@ gdt32_end:
ENTRY(cpu_gdt_table) ENTRY(cpu_gdt_table)
.quad 0x0000000000000000 /* NULL descriptor */ .quad 0x0000000000000000 /* NULL descriptor */
.quad 0x0000000000000000 /* unused */ .quad 0x00af9a000000ffff ^ (1<<21) /* __KERNEL_COMPAT32_CS */
.quad 0x00af9a000000ffff /* __KERNEL_CS */ .quad 0x00af9a000000ffff /* __KERNEL_CS */
.quad 0x00cf92000000ffff /* __KERNEL_DS */ .quad 0x00cf92000000ffff /* __KERNEL_DS */
.quad 0x00cffe000000ffff /* __USER32_CS */ .quad 0x00cffe000000ffff /* __USER32_CS */
......
...@@ -1060,7 +1060,8 @@ static void __init setup_ioapic_ids_from_mpc (void) ...@@ -1060,7 +1060,8 @@ static void __init setup_ioapic_ids_from_mpc (void)
phys_id_present_map |= 1 << i; phys_id_present_map |= 1 << i;
mp_ioapics[apic].mpc_apicid = i; mp_ioapics[apic].mpc_apicid = i;
} else { } else {
printk("Setting %d in the phys_id_present_map\n", mp_ioapics[apic].mpc_apicid); printk(KERN_INFO
"Using IO-APIC %d\n", mp_ioapics[apic].mpc_apicid);
phys_id_present_map |= 1 << mp_ioapics[apic].mpc_apicid; phys_id_present_map |= 1 << mp_ioapics[apic].mpc_apicid;
} }
......
/* /*
* linux/arch/i386/kernel/ioport.c * linux/arch/x86_64/kernel/ioport.c
* *
* This contains the io-permission bitmap code - written by obz, with changes * This contains the io-permission bitmap code - written by obz, with changes
* by Linus. * by Linus.
...@@ -15,34 +15,35 @@ ...@@ -15,34 +15,35 @@
#include <linux/smp_lock.h> #include <linux/smp_lock.h>
#include <linux/stddef.h> #include <linux/stddef.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <asm/io.h>
/* Set EXTENT bits starting at BASE in BITMAP to value TURN_ON. */ /* Set EXTENT bits starting at BASE in BITMAP to value TURN_ON. */
static void set_bitmap(unsigned long *bitmap, short base, short extent, int new_value) static void set_bitmap(unsigned long *bitmap, short base, short extent, int new_value)
{ {
int mask; unsigned long mask;
unsigned long *bitmap_base = bitmap + (base >> 6); unsigned long *bitmap_base = bitmap + (base / sizeof(unsigned long));
unsigned short low_index = base & 0x3f; unsigned short low_index = base & 0x3f;
int length = low_index + extent; int length = low_index + extent;
if (low_index != 0) { if (low_index != 0) {
mask = (~0 << low_index); mask = (~0UL << low_index);
if (length < 64) if (length < 64)
mask &= ~(~0 << length); mask &= ~(~0UL << length);
if (new_value) if (new_value)
*bitmap_base++ |= mask; *bitmap_base++ |= mask;
else else
*bitmap_base++ &= ~mask; *bitmap_base++ &= ~mask;
length -= 32; length -= 64;
} }
mask = (new_value ? ~0 : 0); mask = (new_value ? ~0UL : 0UL);
while (length >= 64) { while (length >= 64) {
*bitmap_base++ = mask; *bitmap_base++ = mask;
length -= 64; length -= 64;
} }
if (length > 0) { if (length > 0) {
mask = ~(~0 << length); mask = ~(~0UL << length);
if (new_value) if (new_value)
*bitmap_base++ |= mask; *bitmap_base++ |= mask;
else else
...@@ -113,3 +114,10 @@ asmlinkage long sys_iopl(unsigned int level, struct pt_regs regs) ...@@ -113,3 +114,10 @@ asmlinkage long sys_iopl(unsigned int level, struct pt_regs regs)
regs.eflags = (regs.eflags & 0xffffffffffffcfff) | (level << 12); regs.eflags = (regs.eflags & 0xffffffffffffcfff) | (level << 12);
return 0; return 0;
} }
void eat_key(void)
{
if (inb(0x60) & 1)
inb(0x64);
}
...@@ -795,7 +795,7 @@ static unsigned int parse_hex_value (const char *buffer, ...@@ -795,7 +795,7 @@ static unsigned int parse_hex_value (const char *buffer,
{ {
unsigned char hexnum [HEX_DIGITS]; unsigned char hexnum [HEX_DIGITS];
unsigned long value; unsigned long value;
int i; unsigned i;
if (!count) if (!count)
return -EINVAL; return -EINVAL;
......
...@@ -32,13 +32,13 @@ static void flush_ldt(void *null) ...@@ -32,13 +32,13 @@ static void flush_ldt(void *null)
} }
#endif #endif
static int alloc_ldt(mm_context_t *pc, int mincount, int reload) static int alloc_ldt(mm_context_t *pc, unsigned mincount, int reload)
{ {
void *oldldt; void *oldldt;
void *newldt; void *newldt;
int oldsize; unsigned oldsize;
if (mincount <= pc->size) if (mincount <= (unsigned)pc->size)
return 0; return 0;
oldsize = pc->size; oldsize = pc->size;
mincount = (mincount+511)&(~511); mincount = (mincount+511)&(~511);
...@@ -63,7 +63,7 @@ static int alloc_ldt(mm_context_t *pc, int mincount, int reload) ...@@ -63,7 +63,7 @@ static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
preempt_disable(); preempt_disable();
load_LDT(pc); load_LDT(pc);
if (current->mm->cpu_vm_mask != (1<<smp_processor_id())) if (current->mm->cpu_vm_mask != (1UL<<smp_processor_id()))
smp_call_function(flush_ldt, 0, 1, 1); smp_call_function(flush_ldt, 0, 1, 1);
preempt_enable(); preempt_enable();
#else #else
...@@ -116,7 +116,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm) ...@@ -116,7 +116,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
void destroy_context(struct mm_struct *mm) void destroy_context(struct mm_struct *mm)
{ {
if (mm->context.size) { if (mm->context.size) {
if (mm->context.size*LDT_ENTRY_SIZE > PAGE_SIZE) if ((unsigned)mm->context.size*LDT_ENTRY_SIZE > PAGE_SIZE)
vfree(mm->context.ldt); vfree(mm->context.ldt);
else else
kfree(mm->context.ldt); kfree(mm->context.ldt);
...@@ -190,7 +190,7 @@ static int write_ldt(void * ptr, unsigned long bytecount, int oldmode) ...@@ -190,7 +190,7 @@ static int write_ldt(void * ptr, unsigned long bytecount, int oldmode)
} }
down(&mm->context.sem); down(&mm->context.sem);
if (ldt_info.entry_number >= mm->context.size) { if (ldt_info.entry_number >= (unsigned)mm->context.size) {
error = alloc_ldt(&current->mm->context, ldt_info.entry_number+1, 1); error = alloc_ldt(&current->mm->context, ldt_info.entry_number+1, 1);
if (error < 0) if (error < 0)
goto out_unlock; goto out_unlock;
......
...@@ -892,11 +892,15 @@ void __init mp_parse_prt (void) ...@@ -892,11 +892,15 @@ void __init mp_parse_prt (void)
list_for_each(node, &acpi_prt.entries) { list_for_each(node, &acpi_prt.entries) {
entry = list_entry(node, struct acpi_prt_entry, node); entry = list_entry(node, struct acpi_prt_entry, node);
/* We're only interested in static (non-link) entries. */ /* Need to get irq for dynamic entry */
if (entry->link.handle) if (entry->link.handle) {
irq = acpi_pci_link_get_irq(entry->link.handle, entry->link.index);
if (!irq)
continue; continue;
}
else
irq = entry->link.index; irq = entry->link.index;
ioapic = mp_find_ioapic(irq); ioapic = mp_find_ioapic(irq);
if (ioapic < 0) if (ioapic < 0)
continue; continue;
......
...@@ -32,7 +32,6 @@ int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, ...@@ -32,7 +32,6 @@ int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int i; int i;
BUG_ON(direction == PCI_DMA_NONE); BUG_ON(direction == PCI_DMA_NONE);
for (i = 0; i < nents; i++ ) { for (i = 0; i < nents; i++ ) {
struct scatterlist *s = &sg[i]; struct scatterlist *s = &sg[i];
......
...@@ -14,15 +14,10 @@ ...@@ -14,15 +14,10 @@
/* /*
* Notebook: * Notebook:
agpgart_be
check if the simple reservation scheme is enough.
possible future tuning: possible future tuning:
fast path for sg streaming mappings fast path for sg streaming mappings - only take the locks once.
more intelligent flush strategy - flush only a single NB? flush only when more intelligent flush strategy - flush only the NB of the CPU directly
gart area fills up and alloc_iommu wraps. connected to the device?
don't flush on allocation - need to unmap the gart area first to avoid prefetches
by the CPU
move boundary between IOMMU and AGP in GART dynamically move boundary between IOMMU and AGP in GART dynamically
*/ */
...@@ -67,7 +62,8 @@ static unsigned long *iommu_gart_bitmap; /* guarded by iommu_bitmap_lock */ ...@@ -67,7 +62,8 @@ static unsigned long *iommu_gart_bitmap; /* guarded by iommu_bitmap_lock */
#define GPTE_VALID 1 #define GPTE_VALID 1
#define GPTE_COHERENT 2 #define GPTE_COHERENT 2
#define GPTE_ENCODE(x) (((x) & 0xfffff000) | (((x) >> 32) << 4) | GPTE_VALID | GPTE_COHERENT) #define GPTE_ENCODE(x) \
(((x) & 0xfffff000) | (((x) >> 32) << 4) | GPTE_VALID | GPTE_COHERENT)
#define GPTE_DECODE(x) (((x) & 0xfffff000) | (((u64)(x) & 0xff0) << 28)) #define GPTE_DECODE(x) (((x) & 0xfffff000) | (((u64)(x) & 0xff0) << 28))
#define for_all_nb(dev) \ #define for_all_nb(dev) \
...@@ -90,20 +86,23 @@ AGPEXTERN __u32 *agp_gatt_table; ...@@ -90,20 +86,23 @@ AGPEXTERN __u32 *agp_gatt_table;
static unsigned long next_bit; /* protected by iommu_bitmap_lock */ static unsigned long next_bit; /* protected by iommu_bitmap_lock */
static unsigned long alloc_iommu(int size) static unsigned long alloc_iommu(int size, int *flush)
{ {
unsigned long offset, flags; unsigned long offset, flags;
spin_lock_irqsave(&iommu_bitmap_lock, flags); spin_lock_irqsave(&iommu_bitmap_lock, flags);
offset = find_next_zero_string(iommu_gart_bitmap,next_bit,iommu_pages,size); offset = find_next_zero_string(iommu_gart_bitmap,next_bit,iommu_pages,size);
if (offset == -1) if (offset == -1) {
*flush = 1;
offset = find_next_zero_string(iommu_gart_bitmap,0,next_bit,size); offset = find_next_zero_string(iommu_gart_bitmap,0,next_bit,size);
}
if (offset != -1) { if (offset != -1) {
set_bit_string(iommu_gart_bitmap, offset, size); set_bit_string(iommu_gart_bitmap, offset, size);
next_bit = offset+size; next_bit = offset+size;
if (next_bit >= iommu_pages) if (next_bit >= iommu_pages) {
next_bit = 0; next_bit = 0;
*flush = 1;
}
} }
spin_unlock_irqrestore(&iommu_bitmap_lock, flags); spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
return offset; return offset;
...@@ -114,7 +113,6 @@ static void free_iommu(unsigned long offset, int size) ...@@ -114,7 +113,6 @@ static void free_iommu(unsigned long offset, int size)
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&iommu_bitmap_lock, flags); spin_lock_irqsave(&iommu_bitmap_lock, flags);
clear_bit_string(iommu_gart_bitmap, offset, size); clear_bit_string(iommu_gart_bitmap, offset, size);
next_bit = offset;
spin_unlock_irqrestore(&iommu_bitmap_lock, flags); spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
} }
...@@ -137,6 +135,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, ...@@ -137,6 +135,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
int gfp = GFP_ATOMIC; int gfp = GFP_ATOMIC;
int i; int i;
unsigned long iommu_page; unsigned long iommu_page;
int flush = 0;
if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu) if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu)
gfp |= GFP_DMA; gfp |= GFP_DMA;
...@@ -150,9 +149,10 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, ...@@ -150,9 +149,10 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
if (memory == NULL) { if (memory == NULL) {
return NULL; return NULL;
} else { } else {
int high = (unsigned long)virt_to_bus(memory) + size int high = 0, mmu;
>= 0xffffffff; if (((unsigned long)virt_to_bus(memory) + size) > 0xffffffffUL)
int mmu = high; high = 1;
mmu = 1;
if (force_mmu && !(gfp & GFP_DMA)) if (force_mmu && !(gfp & GFP_DMA))
mmu = 1; mmu = 1;
if (no_iommu) { if (no_iommu) {
...@@ -168,7 +168,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, ...@@ -168,7 +168,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
size >>= PAGE_SHIFT; size >>= PAGE_SHIFT;
iommu_page = alloc_iommu(size); iommu_page = alloc_iommu(size, &flush);
if (iommu_page == -1) if (iommu_page == -1)
goto error; goto error;
...@@ -183,6 +183,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, ...@@ -183,6 +183,7 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem); iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
} }
if (flush)
flush_gart(); flush_gart();
*dma_handle = iommu_bus_base + (iommu_page << PAGE_SHIFT); *dma_handle = iommu_bus_base + (iommu_page << PAGE_SHIFT);
return memory; return memory;
...@@ -199,25 +200,24 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, ...@@ -199,25 +200,24 @@ void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
void pci_free_consistent(struct pci_dev *hwdev, size_t size, void pci_free_consistent(struct pci_dev *hwdev, size_t size,
void *vaddr, dma_addr_t bus) void *vaddr, dma_addr_t bus)
{ {
u64 pte;
unsigned long iommu_page; unsigned long iommu_page;
int i;
size = round_up(size, PAGE_SIZE); size = round_up(size, PAGE_SIZE);
if (bus < iommu_bus_base || bus > iommu_bus_base + iommu_size) { if (bus >= iommu_bus_base && bus <= iommu_bus_base + iommu_size) {
free_pages((unsigned long)vaddr, get_order(size)); unsigned pages = size >> PAGE_SHIFT;
return; iommu_page = (bus - iommu_bus_base) >> PAGE_SHIFT;
} vaddr = __va(GPTE_DECODE(iommu_gatt_base[iommu_page]));
size >>= PAGE_SHIFT; #ifdef CONFIG_IOMMU_DEBUG
iommu_page = (bus - iommu_bus_base) / PAGE_SIZE; int i;
for (i = 0; i < size; i++) { for (i = 0; i < pages; i++) {
pte = iommu_gatt_base[iommu_page + i]; u64 pte = iommu_gatt_base[iommu_page + i];
BUG_ON((pte & GPTE_VALID) == 0); BUG_ON((pte & GPTE_VALID) == 0);
iommu_gatt_base[iommu_page + i] = 0; iommu_gatt_base[iommu_page + i] = 0;
free_page((unsigned long) __va(GPTE_DECODE(pte)));
} }
flush_gart(); #endif
free_iommu(iommu_page, size); free_iommu(iommu_page, pages);
}
free_pages((unsigned long)vaddr, get_order(size));
} }
#ifdef CONFIG_IOMMU_LEAK #ifdef CONFIG_IOMMU_LEAK
...@@ -257,7 +257,7 @@ static void iommu_full(struct pci_dev *dev, void *addr, size_t size, int dir) ...@@ -257,7 +257,7 @@ static void iommu_full(struct pci_dev *dev, void *addr, size_t size, int dir)
*/ */
printk(KERN_ERR printk(KERN_ERR
"PCI-DMA: Error: ran out out IOMMU space for %p size %lu at device %s[%s]\n", "PCI-DMA: Out of IOMMU space for %p size %lu at device %s[%s]\n",
addr,size, dev ? dev->dev.name : "?", dev ? dev->slot_name : "?"); addr,size, dev ? dev->dev.name : "?", dev ? dev->slot_name : "?");
if (size > PAGE_SIZE*EMERGENCY_PAGES) { if (size > PAGE_SIZE*EMERGENCY_PAGES) {
...@@ -287,12 +287,12 @@ static inline int need_iommu(struct pci_dev *dev, unsigned long addr, size_t siz ...@@ -287,12 +287,12 @@ static inline int need_iommu(struct pci_dev *dev, unsigned long addr, size_t siz
return mmu; return mmu;
} }
dma_addr_t __pci_map_single(struct pci_dev *dev, void *addr, size_t size, dma_addr_t pci_map_single(struct pci_dev *dev, void *addr, size_t size, int dir)
int dir, int flush)
{ {
unsigned long iommu_page; unsigned long iommu_page;
unsigned long phys_mem, bus; unsigned long phys_mem, bus;
int i, npages; int i, npages;
int flush = 0;
BUG_ON(dir == PCI_DMA_NONE); BUG_ON(dir == PCI_DMA_NONE);
...@@ -302,7 +302,7 @@ dma_addr_t __pci_map_single(struct pci_dev *dev, void *addr, size_t size, ...@@ -302,7 +302,7 @@ dma_addr_t __pci_map_single(struct pci_dev *dev, void *addr, size_t size,
npages = round_up(size + ((u64)addr & ~PAGE_MASK), PAGE_SIZE) >> PAGE_SHIFT; npages = round_up(size + ((u64)addr & ~PAGE_MASK), PAGE_SIZE) >> PAGE_SHIFT;
iommu_page = alloc_iommu(npages); iommu_page = alloc_iommu(npages, &flush);
if (iommu_page == -1) { if (iommu_page == -1) {
iommu_full(dev, addr, size, dir); iommu_full(dev, addr, size, dir);
return iommu_bus_base; return iommu_bus_base;
...@@ -343,12 +343,14 @@ void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, ...@@ -343,12 +343,14 @@ void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
size_t size, int direction) size_t size, int direction)
{ {
unsigned long iommu_page; unsigned long iommu_page;
int i, npages; int npages;
if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE || if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE ||
dma_addr > iommu_bus_base + iommu_size) dma_addr > iommu_bus_base + iommu_size)
return; return;
iommu_page = (dma_addr - iommu_bus_base)>>PAGE_SHIFT; iommu_page = (dma_addr - iommu_bus_base)>>PAGE_SHIFT;
npages = round_up(size + (dma_addr & ~PAGE_MASK), PAGE_SIZE) >> PAGE_SHIFT; npages = round_up(size + (dma_addr & ~PAGE_MASK), PAGE_SIZE) >> PAGE_SHIFT;
#ifdef CONFIG_IOMMU_DEBUG
int i;
for (i = 0; i < npages; i++) { for (i = 0; i < npages; i++) {
iommu_gatt_base[iommu_page + i] = 0; iommu_gatt_base[iommu_page + i] = 0;
#ifdef CONFIG_IOMMU_LEAK #ifdef CONFIG_IOMMU_LEAK
...@@ -356,11 +358,11 @@ void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, ...@@ -356,11 +358,11 @@ void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
iommu_leak_tab[iommu_page + i] = 0; iommu_leak_tab[iommu_page + i] = 0;
#endif #endif
} }
flush_gart(); #endif
free_iommu(iommu_page, npages); free_iommu(iommu_page, npages);
} }
EXPORT_SYMBOL(__pci_map_single); EXPORT_SYMBOL(pci_map_single);
EXPORT_SYMBOL(pci_unmap_single); EXPORT_SYMBOL(pci_unmap_single);
static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size) static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size)
...@@ -407,7 +409,7 @@ static __init unsigned read_aperture(struct pci_dev *dev, u32 *size) ...@@ -407,7 +409,7 @@ static __init unsigned read_aperture(struct pci_dev *dev, u32 *size)
* Private Northbridge GATT initialization in case we cannot use the * Private Northbridge GATT initialization in case we cannot use the
* AGP driver for some reason. * AGP driver for some reason.
*/ */
static __init int init_k8_gatt(agp_kern_info *info) static __init int init_k8_gatt(struct agp_kern_info *info)
{ {
struct pci_dev *dev; struct pci_dev *dev;
void *gatt; void *gatt;
...@@ -443,7 +445,7 @@ static __init int init_k8_gatt(agp_kern_info *info) ...@@ -443,7 +445,7 @@ static __init int init_k8_gatt(agp_kern_info *info)
u32 ctl; u32 ctl;
u32 gatt_reg; u32 gatt_reg;
gatt_reg = ((u64)gatt) >> 12; gatt_reg = __pa(gatt) >> 12;
gatt_reg <<= 4; gatt_reg <<= 4;
pci_write_config_dword(dev, 0x98, gatt_reg); pci_write_config_dword(dev, 0x98, gatt_reg);
pci_read_config_dword(dev, 0x90, &ctl); pci_read_config_dword(dev, 0x90, &ctl);
...@@ -465,9 +467,11 @@ static __init int init_k8_gatt(agp_kern_info *info) ...@@ -465,9 +467,11 @@ static __init int init_k8_gatt(agp_kern_info *info)
return -1; return -1;
} }
extern int agp_amdk8_init(void);
void __init pci_iommu_init(void) void __init pci_iommu_init(void)
{ {
agp_kern_info info; struct agp_kern_info info;
unsigned long aper_size; unsigned long aper_size;
unsigned long iommu_start; unsigned long iommu_start;
...@@ -476,7 +480,6 @@ void __init pci_iommu_init(void) ...@@ -476,7 +480,6 @@ void __init pci_iommu_init(void)
#else #else
/* Add other K8 AGP bridge drivers here */ /* Add other K8 AGP bridge drivers here */
no_agp = no_agp || no_agp = no_agp ||
(agp_init() < 0) ||
(agp_amdk8_init() < 0) || (agp_amdk8_init() < 0) ||
(agp_copy_info(&info) < 0); (agp_copy_info(&info) < 0);
#endif #endif
...@@ -536,8 +539,17 @@ void __init pci_iommu_init(void) ...@@ -536,8 +539,17 @@ void __init pci_iommu_init(void)
iommu_gatt_base = agp_gatt_table + (iommu_start>>PAGE_SHIFT); iommu_gatt_base = agp_gatt_table + (iommu_start>>PAGE_SHIFT);
bad_dma_address = iommu_bus_base; bad_dma_address = iommu_bus_base;
change_page_attr(virt_to_page(__va(iommu_start)), iommu_pages, PAGE_KERNEL); /*
global_flush_tlb(); * Unmap the IOMMU part of the GART. The alias of the page is always mapped
* with cache enabled and there is no full cache coherency across the GART
* remapping. The unmapping avoids automatic prefetches from the CPU
* allocating cache lines in there. All CPU accesses are done via the
* direct mapping to the backing memory. The GART address is only used by PCI
* devices.
*/
clear_kernel_mapping((unsigned long)__va(iommu_bus_base), iommu_size);
flush_gart();
} }
/* iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]] /* iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]]
......
...@@ -216,7 +216,12 @@ void flush_thread(void) ...@@ -216,7 +216,12 @@ void flush_thread(void)
{ {
struct task_struct *tsk = current; struct task_struct *tsk = current;
memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8); tsk->thread.debugreg0 = 0;
tsk->thread.debugreg1 = 0;
tsk->thread.debugreg2 = 0;
tsk->thread.debugreg3 = 0;
tsk->thread.debugreg6 = 0;
tsk->thread.debugreg7 = 0;
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
/* /*
* Forget coprocessor state.. * Forget coprocessor state..
...@@ -285,7 +290,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp, ...@@ -285,7 +290,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp,
childregs->rax = 0; childregs->rax = 0;
childregs->rsp = rsp; childregs->rsp = rsp;
if (rsp == ~0) { if (rsp == ~0UL) {
childregs->rsp = (unsigned long)childregs; childregs->rsp = (unsigned long)childregs;
} }
p->set_child_tid = p->clear_child_tid = NULL; p->set_child_tid = p->clear_child_tid = NULL;
...@@ -294,7 +299,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp, ...@@ -294,7 +299,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp,
p->thread.rsp0 = (unsigned long) (childregs+1); p->thread.rsp0 = (unsigned long) (childregs+1);
p->thread.userrsp = me->thread.userrsp; p->thread.userrsp = me->thread.userrsp;
p->thread.rip = (unsigned long) ret_from_fork; set_ti_thread_flag(p->thread_info, TIF_FORK);
p->thread.fs = me->thread.fs; p->thread.fs = me->thread.fs;
p->thread.gs = me->thread.gs; p->thread.gs = me->thread.gs;
...@@ -335,8 +340,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp, ...@@ -335,8 +340,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp,
/* /*
* This special macro can be used to load a debugging register * This special macro can be used to load a debugging register
*/ */
#define loaddebug(thread,register) \ #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r)
set_debug(thread->debugreg[register], register)
/* /*
* switch_to(x,y) should switch tasks from x to y. * switch_to(x,y) should switch tasks from x to y.
...@@ -422,7 +426,7 @@ struct task_struct *__switch_to(struct task_struct *prev_p, struct task_struct * ...@@ -422,7 +426,7 @@ struct task_struct *__switch_to(struct task_struct *prev_p, struct task_struct *
/* /*
* Now maybe reload the debug registers * Now maybe reload the debug registers
*/ */
if (unlikely(next->debugreg[7])) { if (unlikely(next->debugreg7)) {
loaddebug(next, 0); loaddebug(next, 0);
loaddebug(next, 1); loaddebug(next, 1);
loaddebug(next, 2); loaddebug(next, 2);
...@@ -490,19 +494,15 @@ void set_personality_64bit(void) ...@@ -490,19 +494,15 @@ void set_personality_64bit(void)
asmlinkage long sys_fork(struct pt_regs regs) asmlinkage long sys_fork(struct pt_regs regs)
{ {
struct task_struct *p; return do_fork(SIGCHLD, regs.rsp, &regs, 0, NULL, NULL);
p = do_fork(SIGCHLD, regs.rsp, &regs, 0, NULL, NULL);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
} }
asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void *parent_tid, void *child_tid, struct pt_regs regs) asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void *parent_tid, void *child_tid, struct pt_regs regs)
{ {
struct task_struct *p;
if (!newsp) if (!newsp)
newsp = regs.rsp; newsp = regs.rsp;
p = do_fork(clone_flags & ~CLONE_IDLETASK, newsp, &regs, 0, return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, &regs, 0,
parent_tid, child_tid); parent_tid, child_tid);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
} }
/* /*
...@@ -517,10 +517,8 @@ asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void * ...@@ -517,10 +517,8 @@ asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void *
*/ */
asmlinkage long sys_vfork(struct pt_regs regs) asmlinkage long sys_vfork(struct pt_regs regs)
{ {
struct task_struct *p; return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.rsp, &regs, 0,
p = do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.rsp, &regs, 0,
NULL, NULL); NULL, NULL);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
} }
/* /*
......
...@@ -178,11 +178,11 @@ static unsigned long getreg(struct task_struct *child, unsigned long regno) ...@@ -178,11 +178,11 @@ static unsigned long getreg(struct task_struct *child, unsigned long regno)
} }
asmlinkage long sys_ptrace(long request, long pid, long addr, long data) asmlinkage long sys_ptrace(long request, long pid, unsigned long addr, long data)
{ {
struct task_struct *child; struct task_struct *child;
struct user * dummy = NULL;
long i, ret; long i, ret;
unsigned ui;
/* This lock_kernel fixes a subtle race with suid exec */ /* This lock_kernel fixes a subtle race with suid exec */
lock_kernel(); lock_kernel();
...@@ -240,18 +240,35 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data) ...@@ -240,18 +240,35 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
unsigned long tmp; unsigned long tmp;
ret = -EIO; ret = -EIO;
if ((addr & 7) || addr < 0 || if ((addr & 7) ||
addr > sizeof(struct user) - 7) addr > sizeof(struct user) - 7)
break; break;
tmp = 0; /* Default return condition */ switch (addr) {
if(addr < sizeof(struct user_regs_struct)) case 0 ... sizeof(struct user_regs_struct):
tmp = getreg(child, addr); tmp = getreg(child, addr);
if(addr >= (long) &dummy->u_debugreg[0] && break;
addr <= (long) &dummy->u_debugreg[7]){ case offsetof(struct user, u_debugreg[0]):
addr -= (long) &dummy->u_debugreg[0]; tmp = child->thread.debugreg0;
addr = addr >> 3; break;
tmp = child->thread.debugreg[addr]; case offsetof(struct user, u_debugreg[1]):
tmp = child->thread.debugreg1;
break;
case offsetof(struct user, u_debugreg[2]):
tmp = child->thread.debugreg2;
break;
case offsetof(struct user, u_debugreg[3]):
tmp = child->thread.debugreg3;
break;
case offsetof(struct user, u_debugreg[6]):
tmp = child->thread.debugreg6;
break;
case offsetof(struct user, u_debugreg[7]):
tmp = child->thread.debugreg7;
break;
default:
tmp = 0;
break;
} }
ret = put_user(tmp,(unsigned long *) data); ret = put_user(tmp,(unsigned long *) data);
break; break;
...@@ -268,47 +285,53 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data) ...@@ -268,47 +285,53 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
case PTRACE_POKEUSR: /* write the word at location addr in the USER area */ case PTRACE_POKEUSR: /* write the word at location addr in the USER area */
ret = -EIO; ret = -EIO;
if ((addr & 7) || addr < 0 || if ((addr & 7) ||
addr > sizeof(struct user) - 7) addr > sizeof(struct user) - 7)
break; break;
if (addr < sizeof(struct user_regs_struct)) { switch (addr) {
case 0 ... sizeof(struct user_regs_struct):
ret = putreg(child, addr, data); ret = putreg(child, addr, data);
break; break;
} /* Disallows to set a breakpoint into the vsyscall */
/* We need to be very careful here. We implicitly case offsetof(struct user, u_debugreg[0]):
want to modify a portion of the task_struct, and we if (data >= TASK_SIZE-7) break;
have to be selective about what portions we allow someone child->thread.debugreg0 = data;
to modify. */ ret = 0;
break;
ret = -EIO; case offsetof(struct user, u_debugreg[1]):
if(addr >= (long) &dummy->u_debugreg[0] && if (data >= TASK_SIZE-7) break;
addr <= (long) &dummy->u_debugreg[7]){ child->thread.debugreg1 = data;
ret = 0;
if(addr == (long) &dummy->u_debugreg[4]) break; break;
if(addr == (long) &dummy->u_debugreg[5]) break; case offsetof(struct user, u_debugreg[2]):
if(addr < (long) &dummy->u_debugreg[4] && if (data >= TASK_SIZE-7) break;
((unsigned long) data) >= TASK_SIZE-3) break; child->thread.debugreg2 = data;
ret = 0;
if (addr == (long) &dummy->u_debugreg[6]) { break;
case offsetof(struct user, u_debugreg[3]):
if (data >= TASK_SIZE-7) break;
child->thread.debugreg3 = data;
ret = 0;
break;
case offsetof(struct user, u_debugreg[6]):
if (data >> 32) if (data >> 32)
goto out_tsk; break;
} child->thread.debugreg6 = data;
ret = 0;
if(addr == (long) &dummy->u_debugreg[7]) { break;
case offsetof(struct user, u_debugreg[7]):
data &= ~DR_CONTROL_RESERVED; data &= ~DR_CONTROL_RESERVED;
for(i=0; i<4; i++) for(i=0; i<4; i++)
if ((0x5454 >> ((data >> (16 + 4*i)) & 0xf)) & 1) if ((0x5454 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
goto out_tsk; break;
} if (i == 4) {
child->thread.debugreg7 = data;
addr -= (long) &dummy->u_debugreg;
addr = addr >> 3;
child->thread.debugreg[addr] = data;
ret = 0; ret = 0;
} }
break; break;
}
break;
case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */ case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */
case PTRACE_CONT: { /* restart after signal. */ case PTRACE_CONT: { /* restart after signal. */
long tmp; long tmp;
...@@ -408,8 +431,8 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data) ...@@ -408,8 +431,8 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
ret = -EIO; ret = -EIO;
break; break;
} }
for ( i = 0; i < sizeof(struct user_regs_struct); i += sizeof(long) ) { for (ui = 0; ui < sizeof(struct user_regs_struct); ui += sizeof(long)) {
__put_user(getreg(child, i),(unsigned long *) data); __put_user(getreg(child, ui),(unsigned long *) data);
data += sizeof(long); data += sizeof(long);
} }
ret = 0; ret = 0;
...@@ -422,9 +445,9 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data) ...@@ -422,9 +445,9 @@ asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
ret = -EIO; ret = -EIO;
break; break;
} }
for ( i = 0; i < sizeof(struct user_regs_struct); i += sizeof(long) ) { for (ui = 0; ui < sizeof(struct user_regs_struct); ui += sizeof(long)) {
__get_user(tmp, (unsigned long *) data); __get_user(tmp, (unsigned long *) data);
putreg(child, i, tmp); putreg(child, ui, tmp);
data += sizeof(long); data += sizeof(long);
} }
ret = 0; ret = 0;
......
...@@ -9,7 +9,9 @@ ...@@ -9,7 +9,9 @@
#include <asm/kdebug.h> #include <asm/kdebug.h>
#include <asm/delay.h> #include <asm/delay.h>
#include <asm/hw_irq.h> #include <asm/hw_irq.h>
#include <asm/system.h>
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
/* /*
* Power off function, if any * Power off function, if any
...@@ -17,35 +19,37 @@ ...@@ -17,35 +19,37 @@
void (*pm_power_off)(void); void (*pm_power_off)(void);
static long no_idt[3]; static long no_idt[3];
static int reboot_mode; static enum {
BOOT_BIOS = 'b',
#ifdef CONFIG_SMP BOOT_TRIPLE = 't',
int reboot_smp = 0; BOOT_KBD = 'k'
static int reboot_cpu = -1; } reboot_type = BOOT_KBD;
#endif static int reboot_mode = 0;
/* reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old]
bios Use the CPU reboto vector for warm reset
warm Don't set the cold reboot flag
cold Set the cold reboto flag
triple Force a triple fault (init)
kbd Use the keyboard controller. cold reset (default)
*/
static int __init reboot_setup(char *str) static int __init reboot_setup(char *str)
{ {
while(1) { for (;;) {
switch (*str) { switch (*str) {
case 'w': /* "warm" reboot (no memory testing etc) */ case 'w':
reboot_mode = 0x1234; reboot_mode = 0x1234;
break; break;
case 'c': /* "cold" reboot (with memory testing etc) */
reboot_mode = 0x0; case 'c':
reboot_mode = 0;
break; break;
#ifdef CONFIG_SMP
case 's': /* "smp" reboot by executing reset on BSP or other CPU*/ case 't':
reboot_smp = 1; case 'b':
if (isdigit(str[1])) case 'k':
sscanf(str+1, "%d", &reboot_cpu); reboot_type = *str;
else if (!strncmp(str,"smp",3))
sscanf(str+3, "%d", &reboot_cpu);
/* we will leave sorting out the final value
when we are ready to reboot, since we might not
have set up boot_cpu_id or smp_num_cpu */
break; break;
#endif
} }
if((str = strchr(str,',')) != NULL) if((str = strchr(str,',')) != NULL)
str++; str++;
...@@ -57,6 +61,56 @@ static int __init reboot_setup(char *str) ...@@ -57,6 +61,56 @@ static int __init reboot_setup(char *str)
__setup("reboot=", reboot_setup); __setup("reboot=", reboot_setup);
/* overwrites random kernel memory. Should not be kernel .text */
#define WARMBOOT_TRAMP 0x1000UL
static void reboot_warm(void)
{
extern unsigned char warm_reboot[], warm_reboot_end[];
printk("warm reboot\n");
local_irq_disable();
/* restore identity mapping */
init_level4_pgt[0] = __pml4(__pa(level3_ident_pgt) | 7);
__flush_tlb_all();
/* Move the trampoline to low memory */
memcpy(__va(WARMBOOT_TRAMP), warm_reboot, warm_reboot_end - warm_reboot);
/* Start it in compatibility mode. */
asm volatile( " pushq $0\n" /* ss */
" pushq $0x2000\n" /* rsp */
" pushfq\n" /* eflags */
" pushq %[cs]\n"
" pushq %[target]\n"
" iretq" ::
[cs] "i" (__KERNEL_COMPAT32_CS),
[target] "b" (WARMBOOT_TRAMP));
}
#ifdef CONFIG_SMP
static void smp_halt(void)
{
int cpuid = safe_smp_processor_id();
/* Only run this on the boot processor */
if (cpuid != boot_cpu_id) {
static int first_entry = 1;
if (first_entry) {
first_entry = 0;
smp_call_function((void *)machine_restart, NULL, 1, 0);
} else {
/* AP reentering. just halt */
for(;;)
asm volatile("hlt");
}
}
smp_send_stop();
}
#endif
static inline void kb_wait(void) static inline void kb_wait(void)
{ {
int i; int i;
...@@ -68,48 +122,24 @@ static inline void kb_wait(void) ...@@ -68,48 +122,24 @@ static inline void kb_wait(void)
void machine_restart(char * __unused) void machine_restart(char * __unused)
{ {
#ifdef CONFIG_SMP int i;
int cpuid;
cpuid = GET_APIC_ID(apic_read(APIC_ID));
if (reboot_smp) {
/* check to see if reboot_cpu is valid
if its not, default to the BSP */
if ((reboot_cpu == -1) ||
(reboot_cpu > (NR_CPUS -1)) ||
!(phys_cpu_present_map & (1<<cpuid)))
reboot_cpu = boot_cpu_id;
reboot_smp = 0; /* use this as a flag to only go through this once*/
/* re-run this function on the other CPUs
it will fall though this section since we have
cleared reboot_smp, and do the reboot if it is the
correct CPU, otherwise it halts. */
if (reboot_cpu != cpuid)
smp_call_function((void *)machine_restart , NULL, 1, 0);
}
/* if reboot_cpu is still -1, then we want a tradional reboot, #if CONFIG_SMP
and if we are not running on the reboot_cpu,, halt */ smp_halt();
if ((reboot_cpu != -1) && (cpuid != reboot_cpu)) {
for (;;)
__asm__ __volatile__ ("hlt");
}
/*
* Stop all CPUs and turn off local APICs and the IO-APIC, so
* other OSs see a clean IRQ state.
*/
smp_send_stop();
disable_IO_APIC();
#endif #endif
/* rebooting needs to touch the page at absolute addr 0 */ disable_IO_APIC();
/* Tell the BIOS if we want cold or warm reboot */
*((unsigned short *)__va(0x472)) = reboot_mode; *((unsigned short *)__va(0x472)) = reboot_mode;
for (;;) { for (;;) {
int i; /* Could also try the reset bit in the Hammer NB */
/* First fondle with the keyboard controller. */ switch (reboot_type) {
case BOOT_BIOS:
reboot_warm();
case BOOT_KBD:
for (i=0; i<100; i++) { for (i=0; i<100; i++) {
kb_wait(); kb_wait();
udelay(50); udelay(50);
...@@ -117,11 +147,13 @@ void machine_restart(char * __unused) ...@@ -117,11 +147,13 @@ void machine_restart(char * __unused)
udelay(50); udelay(50);
} }
/* Could do reset through the northbridge of Hammer here. */ case BOOT_TRIPLE:
/* That didn't work - force a triple fault.. */
__asm__ __volatile__("lidt %0": :"m" (no_idt)); __asm__ __volatile__("lidt %0": :"m" (no_idt));
__asm__ __volatile__("int3"); __asm__ __volatile__("int3");
reboot_type = BOOT_KBD;
break;
}
} }
} }
......
...@@ -201,6 +201,11 @@ static __init void parse_cmdline_early (char ** cmdline_p) ...@@ -201,6 +201,11 @@ static __init void parse_cmdline_early (char ** cmdline_p)
if (!memcmp(from, "mem=", 4)) if (!memcmp(from, "mem=", 4))
parse_memopt(from+4, &from); parse_memopt(from+4, &from);
#ifdef CONFIG_DISCONTIGMEM
if (!memcmp(from, "numa=", 5))
numa_setup(from+5);
#endif
#ifdef CONFIG_GART_IOMMU #ifdef CONFIG_GART_IOMMU
if (!memcmp(from,"iommu=",6)) { if (!memcmp(from,"iommu=",6)) {
iommu_setup(from+6); iommu_setup(from+6);
...@@ -236,8 +241,6 @@ static void __init contig_initmem_init(void) ...@@ -236,8 +241,6 @@ static void __init contig_initmem_init(void)
void __init setup_arch(char **cmdline_p) void __init setup_arch(char **cmdline_p)
{ {
int i;
Dprintk("setup_arch\n"); Dprintk("setup_arch\n");
ROOT_DEV = ORIG_ROOT_DEV; ROOT_DEV = ORIG_ROOT_DEV;
...@@ -367,9 +370,12 @@ void __init setup_arch(char **cmdline_p) ...@@ -367,9 +370,12 @@ void __init setup_arch(char **cmdline_p)
request_resource(&iomem_resource, &vram_resource); request_resource(&iomem_resource, &vram_resource);
{
unsigned i;
/* request I/O space for devices used on all i[345]86 PCs */ /* request I/O space for devices used on all i[345]86 PCs */
for (i = 0; i < STANDARD_IO_RESOURCES; i++) for (i = 0; i < STANDARD_IO_RESOURCES; i++)
request_resource(&ioport_resource, standard_io_resources+i); request_resource(&ioport_resource, standard_io_resources+i);
}
pci_mem_start = IOMAP_START; pci_mem_start = IOMAP_START;
...@@ -694,7 +700,7 @@ static int show_cpuinfo(struct seq_file *m, void *v) ...@@ -694,7 +700,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, "power management:"); seq_printf(m, "power management:");
{ {
int i; unsigned i;
for (i = 0; i < 32; i++) for (i = 0; i < 32; i++)
if (c->x86_power & (1 << i)) { if (c->x86_power & (1 << i)) {
if (i < ARRAY_SIZE(x86_power_flags)) if (i < ARRAY_SIZE(x86_power_flags))
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <asm/percpu.h> #include <asm/percpu.h>
#include <asm/mtrr.h> #include <asm/mtrr.h>
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/mman.h>
char x86_boot_params[2048] __initdata = {0,}; char x86_boot_params[2048] __initdata = {0,};
...@@ -40,22 +41,79 @@ struct desc_ptr idt_descr = { 256 * 16, (unsigned long) idt_table }; ...@@ -40,22 +41,79 @@ struct desc_ptr idt_descr = { 256 * 16, (unsigned long) idt_table };
char boot_cpu_stack[IRQSTACKSIZE] __cacheline_aligned; char boot_cpu_stack[IRQSTACKSIZE] __cacheline_aligned;
unsigned long __supported_pte_mask = ~0UL; unsigned long __supported_pte_mask = ~0UL;
static int do_not_nx = 1; static int do_not_nx __initdata = 0;
unsigned long vm_stack_flags = __VM_STACK_FLAGS;
unsigned long vm_stack_flags32 = __VM_STACK_FLAGS;
unsigned long vm_data_default_flags = __VM_DATA_DEFAULT_FLAGS;
unsigned long vm_data_default_flags32 = __VM_DATA_DEFAULT_FLAGS;
unsigned long vm_force_exec32 = PROT_EXEC;
/* noexec=on|off
Control non executable mappings for 64bit processes.
on Enable
off Disable
noforce (default) Don't enable by default for heap/stack/data,
but allow PROT_EXEC to be effective
*/
static int __init nonx_setup(char *str) static int __init nonx_setup(char *str)
{ {
if (!strncmp(str,"off",3)) { if (!strncmp(str, "on",3)) {
__supported_pte_mask &= ~_PAGE_NX;
do_not_nx = 1;
} else if (!strncmp(str, "on",3)) {
do_not_nx = 0;
__supported_pte_mask |= _PAGE_NX; __supported_pte_mask |= _PAGE_NX;
do_not_nx = 0;
vm_data_default_flags &= ~VM_EXEC;
vm_stack_flags &= ~VM_EXEC;
} else if (!strncmp(str, "noforce",7) || !strncmp(str,"off",3)) {
do_not_nx = (str[0] == 'o');
if (do_not_nx)
__supported_pte_mask &= ~_PAGE_NX;
vm_data_default_flags |= VM_EXEC;
vm_stack_flags |= VM_EXEC;
} }
return 1; return 1;
} }
__setup("noexec=", nonx_setup); __setup("noexec=", nonx_setup);
/* noexec32=opt{,opt}
Control the no exec default for 32bit processes. Can be also overwritten
per executable using ELF header flags (e.g. needed for the X server)
Requires noexec=on or noexec=noforce to be effective.
Valid options:
all,on Heap,stack,data is non executable.
off (default) Heap,stack,data is executable
stack Stack is non executable, heap/data is.
force Don't imply PROT_EXEC for PROT_READ
compat (default) Imply PROT_EXEC for PROT_READ
*/
static int __init nonx32_setup(char *str)
{
char *s;
while ((s = strsep(&str, ",")) != NULL) {
if (!strcmp(s, "all") || !strcmp(s,"on")) {
vm_data_default_flags32 &= ~VM_EXEC;
vm_stack_flags32 &= ~VM_EXEC;
} else if (!strcmp(s, "off")) {
vm_data_default_flags32 |= VM_EXEC;
vm_stack_flags32 |= VM_EXEC;
} else if (!strcmp(s, "stack")) {
vm_data_default_flags32 |= VM_EXEC;
vm_stack_flags32 &= ~VM_EXEC;
} else if (!strcmp(s, "force")) {
vm_force_exec32 = 0;
} else if (!strcmp(s, "compat")) {
vm_force_exec32 = PROT_EXEC;
}
}
return 1;
}
__setup("noexec32=", nonx32_setup);
#ifndef __GENERIC_PER_CPU #ifndef __GENERIC_PER_CPU
unsigned long __per_cpu_offset[NR_CPUS]; unsigned long __per_cpu_offset[NR_CPUS];
......
...@@ -371,7 +371,7 @@ handle_signal(unsigned long sig, siginfo_t *info, sigset_t *oldset, ...@@ -371,7 +371,7 @@ handle_signal(unsigned long sig, siginfo_t *info, sigset_t *oldset,
regs->rax = regs->orig_rax; regs->rax = regs->orig_rax;
regs->rip -= 2; regs->rip -= 2;
} }
if (regs->rax == -ERESTART_RESTARTBLOCK){ if (regs->rax == (unsigned long)-ERESTART_RESTARTBLOCK){
regs->rax = __NR_restart_syscall; regs->rax = __NR_restart_syscall;
regs->rip -= 2; regs->rip -= 2;
} }
...@@ -434,8 +434,8 @@ int do_signal(struct pt_regs *regs, sigset_t *oldset) ...@@ -434,8 +434,8 @@ int do_signal(struct pt_regs *regs, sigset_t *oldset)
* have been cleared if the watchpoint triggered * have been cleared if the watchpoint triggered
* inside the kernel. * inside the kernel.
*/ */
if (current->thread.debugreg[7]) if (current->thread.debugreg7)
asm volatile("movq %0,%%db7" : : "r" (current->thread.debugreg[7])); asm volatile("movq %0,%%db7" : : "r" (current->thread.debugreg7));
/* Whee! Actually deliver the signal. */ /* Whee! Actually deliver the signal. */
handle_signal(signr, &info, oldset, regs); handle_signal(signr, &info, oldset, regs);
...@@ -446,9 +446,10 @@ int do_signal(struct pt_regs *regs, sigset_t *oldset) ...@@ -446,9 +446,10 @@ int do_signal(struct pt_regs *regs, sigset_t *oldset)
/* Did we come from a system call? */ /* Did we come from a system call? */
if (regs->orig_rax >= 0) { if (regs->orig_rax >= 0) {
/* Restart the system call - no handlers present */ /* Restart the system call - no handlers present */
if (regs->rax == -ERESTARTNOHAND || long res = regs->rax;
regs->rax == -ERESTARTSYS || if (res == -ERESTARTNOHAND ||
regs->rax == -ERESTARTNOINTR) { res == -ERESTARTSYS ||
res == -ERESTARTNOINTR) {
regs->rax = regs->orig_rax; regs->rax = regs->orig_rax;
regs->rip -= 2; regs->rip -= 2;
} }
......
...@@ -42,6 +42,7 @@ ...@@ -42,6 +42,7 @@
#include <linux/smp_lock.h> #include <linux/smp_lock.h>
#include <linux/irq.h> #include <linux/irq.h>
#include <linux/bootmem.h> #include <linux/bootmem.h>
#include <linux/thread_info.h>
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/mc146818rtc.h> #include <linux/mc146818rtc.h>
...@@ -123,7 +124,7 @@ static void __init synchronize_tsc_bp (void) ...@@ -123,7 +124,7 @@ static void __init synchronize_tsc_bp (void)
unsigned long long t0; unsigned long long t0;
unsigned long long sum, avg; unsigned long long sum, avg;
long long delta; long long delta;
unsigned long one_usec; long one_usec;
int buggy = 0; int buggy = 0;
extern unsigned cpu_khz; extern unsigned cpu_khz;
...@@ -339,7 +340,7 @@ extern int cpu_idle(void); ...@@ -339,7 +340,7 @@ extern int cpu_idle(void);
/* /*
* Activate a secondary processor. * Activate a secondary processor.
*/ */
int __init start_secondary(void *unused) void __init start_secondary(void)
{ {
/* /*
* Dont put anything before smp_callin(), SMP * Dont put anything before smp_callin(), SMP
...@@ -380,29 +381,7 @@ int __init start_secondary(void *unused) ...@@ -380,29 +381,7 @@ int __init start_secondary(void *unused)
set_bit(smp_processor_id(), &cpu_online_map); set_bit(smp_processor_id(), &cpu_online_map);
wmb(); wmb();
return cpu_idle(); cpu_idle();
}
/*
* Everything has been set up for the secondary
* CPUs - they just need to reload everything
* from the task structure
* This function must not return.
*/
void __init initialize_secondary(void)
{
struct task_struct *me = stack_current();
/*
* We don't actually need to load the full TSS,
* basically just the stack pointer and the eip.
*/
asm volatile(
"movq %0,%%rsp\n\t"
"jmp *%1"
:
:"r" (me->thread.rsp),"r" (me->thread.rip));
} }
extern volatile unsigned long init_rsp; extern volatile unsigned long init_rsp;
...@@ -412,16 +391,16 @@ static struct task_struct * __init fork_by_hand(void) ...@@ -412,16 +391,16 @@ static struct task_struct * __init fork_by_hand(void)
{ {
struct pt_regs regs; struct pt_regs regs;
/* /*
* don't care about the rip and regs settings since * don't care about the eip and regs settings since
* we'll never reschedule the forked task. * we'll never reschedule the forked task.
*/ */
return do_fork(CLONE_VM|CLONE_IDLETASK, 0, &regs, 0, NULL, NULL); return copy_process(CLONE_VM|CLONE_IDLETASK, 0, &regs, 0, NULL, NULL);
} }
#if APIC_DEBUG #if APIC_DEBUG
static inline void inquire_remote_apic(int apicid) static inline void inquire_remote_apic(int apicid)
{ {
int i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 }; unsigned i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 };
char *names[] = { "ID", "VERSION", "SPIV" }; char *names[] = { "ID", "VERSION", "SPIV" };
int timeout, status; int timeout, status;
...@@ -596,6 +575,7 @@ static void __init do_boot_cpu (int apicid) ...@@ -596,6 +575,7 @@ static void __init do_boot_cpu (int apicid)
idle = fork_by_hand(); idle = fork_by_hand();
if (IS_ERR(idle)) if (IS_ERR(idle))
panic("failed fork for CPU %d", cpu); panic("failed fork for CPU %d", cpu);
wake_up_forked_process(idle);
/* /*
* We remove it from the pidhash and the runqueue * We remove it from the pidhash and the runqueue
...@@ -603,22 +583,19 @@ static void __init do_boot_cpu (int apicid) ...@@ -603,22 +583,19 @@ static void __init do_boot_cpu (int apicid)
*/ */
init_idle(idle,cpu); init_idle(idle,cpu);
idle->thread.rip = (unsigned long)start_secondary;
// idle->thread.rsp = (unsigned long)idle->thread_info + THREAD_SIZE - 512;
unhash_process(idle); unhash_process(idle);
cpu_pda[cpu].pcurrent = idle; cpu_pda[cpu].pcurrent = idle;
/* start_eip had better be page-aligned! */
start_rip = setup_trampoline(); start_rip = setup_trampoline();
init_rsp = (unsigned long)idle->thread_info + PAGE_SIZE + 1024; init_rsp = idle->thread.rsp;
init_tss[cpu].rsp0 = init_rsp; init_tss[cpu].rsp0 = init_rsp;
initial_code = initialize_secondary; initial_code = start_secondary;
clear_ti_thread_flag(idle->thread_info, TIF_FORK);
printk(KERN_INFO "Booting processor %d/%d rip %lx rsp %lx rsp2 %lx\n", cpu, apicid, printk(KERN_INFO "Booting processor %d/%d rip %lx rsp %lx\n", cpu, apicid,
start_rip, idle->thread.rsp, init_rsp); start_rip, init_rsp);
/* /*
* This grunge runs the startup process for * This grunge runs the startup process for
...@@ -676,7 +653,7 @@ static void __init do_boot_cpu (int apicid) ...@@ -676,7 +653,7 @@ static void __init do_boot_cpu (int apicid)
if (test_bit(cpu, &cpu_callin_map)) { if (test_bit(cpu, &cpu_callin_map)) {
/* number CPUs logically, starting from 1 (BSP is 0) */ /* number CPUs logically, starting from 1 (BSP is 0) */
Dprintk("OK.\n"); Dprintk("OK.\n");
printk("KERN_INFO CPU%d: ", cpu); printk(KERN_INFO "CPU%d: ", cpu);
print_cpu_info(&cpu_data[cpu]); print_cpu_info(&cpu_data[cpu]);
Dprintk("CPU has booted.\n"); Dprintk("CPU has booted.\n");
} else { } else {
...@@ -708,7 +685,7 @@ unsigned long cache_decay_ticks; ...@@ -708,7 +685,7 @@ unsigned long cache_decay_ticks;
static void smp_tune_scheduling (void) static void smp_tune_scheduling (void)
{ {
unsigned long cachesize; /* kB */ int cachesize; /* kB */
unsigned long bandwidth = 1000; /* MB/s */ unsigned long bandwidth = 1000; /* MB/s */
/* /*
* Rough estimation for SMP scheduling, this is the number of * Rough estimation for SMP scheduling, this is the number of
...@@ -753,7 +730,7 @@ static void smp_tune_scheduling (void) ...@@ -753,7 +730,7 @@ static void smp_tune_scheduling (void)
static void __init smp_boot_cpus(unsigned int max_cpus) static void __init smp_boot_cpus(unsigned int max_cpus)
{ {
int apicid, cpu; unsigned apicid, cpu;
/* /*
* Setup boot CPU information * Setup boot CPU information
......
...@@ -117,5 +117,5 @@ asmlinkage long sys_uname(struct new_utsname * name) ...@@ -117,5 +117,5 @@ asmlinkage long sys_uname(struct new_utsname * name)
asmlinkage long wrap_sys_shmat(int shmid, char *shmaddr, int shmflg) asmlinkage long wrap_sys_shmat(int shmid, char *shmaddr, int shmflg)
{ {
unsigned long raddr; unsigned long raddr;
return sys_shmat(shmid,shmaddr,shmflg,&raddr) ?: raddr; return sys_shmat(shmid,shmaddr,shmflg,&raddr) ?: (long)raddr;
} }
...@@ -584,12 +584,12 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code) ...@@ -584,12 +584,12 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code)
/* Mask out spurious debug traps due to lazy DR7 setting */ /* Mask out spurious debug traps due to lazy DR7 setting */
if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) { if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
if (!tsk->thread.debugreg[7]) { if (!tsk->thread.debugreg7) {
goto clear_dr7; goto clear_dr7;
} }
} }
tsk->thread.debugreg[6] = condition; tsk->thread.debugreg6 = condition;
/* Mask out spurious TF errors due to lazy TF clearing */ /* Mask out spurious TF errors due to lazy TF clearing */
if (condition & DR_STEP) { if (condition & DR_STEP) {
......
/*
* Code for the vsyscall page. This version uses the syscall instruction.
*/
#include <asm/ia32_unistd.h>
#include <asm/offset.h>
.text
.globl __kernel_vsyscall
.type __kernel_vsyscall,@function
__kernel_vsyscall:
.LSTART_vsyscall:
push %ebp
.Lpush_ebp:
movl %ecx, %ebp
syscall
popl %ebp
.Lpop_ebp:
ret
.LEND_vsyscall:
.size __kernel_vsyscall,.-.LSTART_vsyscall
.balign 32
.globl __kernel_sigreturn
.type __kernel_sigreturn,@function
__kernel_sigreturn:
.LSTART_sigreturn:
popl %eax
movl $__NR_ia32_sigreturn, %eax
syscall
.LEND_sigreturn:
.size __kernel_sigreturn,.-.LSTART_sigreturn
.balign 32
.globl __kernel_rt_sigreturn
.type __kernel_rt_sigreturn,@function
__kernel_rt_sigreturn:
.LSTART_rt_sigreturn:
movl $__NR_ia32_rt_sigreturn, %eax
syscall
.LEND_rt_sigreturn:
.size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
.section .eh_frame,"a",@progbits
.LSTARTFRAME:
.long .LENDCIE-.LSTARTCIE
.LSTARTCIE:
.long 0 /* CIE ID */
.byte 1 /* Version number */
.string "zR" /* NUL-terminated augmentation string */
.uleb128 1 /* Code alignment factor */
.sleb128 -4 /* Data alignment factor */
.byte 8 /* Return address register column */
.uleb128 1 /* Augmentation value length */
.byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
.byte 0x0c /* DW_CFA_def_cfa */
.uleb128 4
.uleb128 4
.byte 0x88 /* DW_CFA_offset, column 0x8 */
.uleb128 1
.align 4
.LENDCIE:
.long .LENDFDE1-.LSTARTFDE1 /* Length FDE */
.LSTARTFDE1:
.long .LSTARTFDE1-.LSTARTFRAME /* CIE pointer */
.long .LSTART_vsyscall-. /* PC-relative start address */
.long .LEND_vsyscall-.LSTART_vsyscall
.uleb128 0 /* Augmentation length */
/* What follows are the instructions for the table generation.
We have to record all changes of the stack pointer. */
.byte 0x40 + .Lpush_ebp-.LSTART_vsyscall /* DW_CFA_advance_loc */
.byte 0x0e /* DW_CFA_def_cfa_offset */
.uleb128 8
.byte 0x85, 0x02 /* DW_CFA_offset %ebp -8 */
.byte 0x40 + .Lpop_ebp-.Lpush_ebp /* DW_CFA_advance_loc */
.byte 0xc5 /* DW_CFA_restore %ebp */
.byte 0x0e /* DW_CFA_def_cfa_offset */
.uleb128 4
.align 4
.LENDFDE1:
.long .LENDFDE2-.LSTARTFDE2 /* Length FDE */
.LSTARTFDE2:
.long .LSTARTFDE2-.LSTARTFRAME /* CIE pointer */
/* HACK: The dwarf2 unwind routines will subtract 1 from the
return address to get an address in the middle of the
presumed call instruction. Since we didn't get here via
a call, we need to include the nop before the real start
to make up for it. */
.long .LSTART_sigreturn-1-. /* PC-relative start address */
.long .LEND_sigreturn-.LSTART_sigreturn+1
.uleb128 0 /* Augmentation length */
/* What follows are the instructions for the table generation.
We record the locations of each register saved. This is
complicated by the fact that the "CFA" is always assumed to
be the value of the stack pointer in the caller. This means
that we must define the CFA of this body of code to be the
saved value of the stack pointer in the sigcontext. Which
also means that there is no fixed relation to the other
saved registers, which means that we must use DW_CFA_expression
to compute their addresses. It also means that when we
adjust the stack with the popl, we have to do it all over again. */
#define do_cfa_expr(offset) \
.byte 0x0f; /* DW_CFA_def_cfa_expression */ \
.uleb128 1f-0f; /* length */ \
0: .byte 0x74; /* DW_OP_breg4 */ \
.sleb128 offset; /* offset */ \
.byte 0x06; /* DW_OP_deref */ \
1:
#define do_expr(regno, offset) \
.byte 0x10; /* DW_CFA_expression */ \
.uleb128 regno; /* regno */ \
.uleb128 1f-0f; /* length */ \
0: .byte 0x74; /* DW_OP_breg4 */ \
.sleb128 offset; /* offset */ \
1:
do_cfa_expr(IA32_SIGCONTEXT_esp+4)
do_expr(0, IA32_SIGCONTEXT_eax+4)
do_expr(1, IA32_SIGCONTEXT_ecx+4)
do_expr(2, IA32_SIGCONTEXT_edx+4)
do_expr(3, IA32_SIGCONTEXT_ebx+4)
do_expr(5, IA32_SIGCONTEXT_ebp+4)
do_expr(6, IA32_SIGCONTEXT_esi+4)
do_expr(7, IA32_SIGCONTEXT_edi+4)
do_expr(8, IA32_SIGCONTEXT_eip+4)
.byte 0x42 /* DW_CFA_advance_loc 2 -- nop; popl eax. */
do_cfa_expr(IA32_SIGCONTEXT_esp)
do_expr(0, IA32_SIGCONTEXT_eax)
do_expr(1, IA32_SIGCONTEXT_ecx)
do_expr(2, IA32_SIGCONTEXT_edx)
do_expr(3, IA32_SIGCONTEXT_ebx)
do_expr(5, IA32_SIGCONTEXT_ebp)
do_expr(6, IA32_SIGCONTEXT_esi)
do_expr(7, IA32_SIGCONTEXT_edi)
do_expr(8, IA32_SIGCONTEXT_eip)
.align 4
.LENDFDE2:
.long .LENDFDE3-.LSTARTFDE3 /* Length FDE */
.LSTARTFDE3:
.long .LSTARTFDE3-.LSTARTFRAME /* CIE pointer */
/* HACK: See above wrt unwind library assumptions. */
.long .LSTART_rt_sigreturn-1-. /* PC-relative start address */
.long .LEND_rt_sigreturn-.LSTART_rt_sigreturn+1
.uleb128 0 /* Augmentation */
/* What follows are the instructions for the table generation.
We record the locations of each register saved. This is
slightly less complicated than the above, since we don't
modify the stack pointer in the process. */
do_cfa_expr(IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_esp)
do_expr(0, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_eax)
do_expr(1, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ecx)
do_expr(2, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_edx)
do_expr(3, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ebx)
do_expr(5, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_ebp)
do_expr(6, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_esi)
do_expr(7, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_edi)
do_expr(8, IA32_RT_SIGFRAME_sigcontext-4 + IA32_SIGCONTEXT_eip)
.align 4
.LENDFDE3:
/*
* Switch back to real mode and call the BIOS reboot vector.
* This is a trampoline copied around in process.c
* Written 2003 by Andi Kleen, SuSE Labs.
*/
#include <asm/msr.h>
#define R(x) x-warm_reboot(%ebx)
#define R64(x) x-warm_reboot(%rbx)
/* running in identity mapping and in the first 64k of memory
and in compatibility mode. This must be position independent */
/* Follows 14.7 "Leaving Long Mode" in the AMD x86-64 manual, volume 2
and 8.9.2 "Switching Back to Real-Address Mode" in the Intel IA32
manual, volume 2 */
/* ebx: self pointer to warm_reboot */
.globl warm_reboot
warm_reboot:
addl %ebx, R64(real_mode_desc) /* relocate tables */
addl %ebx,2+R64(warm_gdt_desc)
movq %cr0,%rax
btr $31,%rax
movq %rax,%cr0 /* disable paging */
jmp 1f /* flush prefetch queue */
.code32
1: movl $MSR_EFER,%ecx
rdmsr
andl $~((1<<_EFER_LME)|(1<<_EFER_SCE)|(1<<_EFER_NX)),%eax
wrmsr /* disable long mode in EFER */
xorl %eax,%eax
movl %eax,%cr3 /* flush tlb */
/* Running protected mode without paging now */
wbinvd /* flush caches. Needed? */
lidt R(warm_idt_desc)
lgdt R(warm_gdt_desc)
movl $0x10,%ecx /* load segment registers with real mode settings */
movl %ecx,%ds
movl %ecx,%es
movl %ecx,%fs
movl %ecx,%gs
movl %ecx,%ss
lea R(real_mode_desc),%eax
ljmp *(%eax)
.code16:
real_mode:
xorl %eax,%eax
movl %eax,%cr0
/* some people claim $0xf000,0xfff0 is better. Use what 32bit linux uses. */
/* code as bytes because gas has problems with it */
.byte 0xea,0xf0,0xff,0x00,0xf0 /* ljmp 0xf000:0xfff0 */
real_mode_desc:
.long real_mode - warm_reboot
.short 8
warm_gdt_desc:
.short 8*3
.long warm_gdt - warm_reboot
warm_gdt:
.quad 0
.quad 0x00009a000000ffff /* 16-bit real-mode 64k code at 0x00000000 */
.quad 0x000092000100ffff /* 16-bit real-mode 64k data at 0x00000100 */
warm_idt_desc:
.short 0x3ff
.long 0
.globl warm_reboot_end
warm_reboot_end:
...@@ -207,3 +207,12 @@ EXPORT_SYMBOL(init_level4_pgt); ...@@ -207,3 +207,12 @@ EXPORT_SYMBOL(init_level4_pgt);
extern unsigned long __supported_pte_mask; extern unsigned long __supported_pte_mask;
EXPORT_SYMBOL(__supported_pte_mask); EXPORT_SYMBOL(__supported_pte_mask);
#ifdef CONFIG_DISCONTIGMEM
EXPORT_SYMBOL(memnode_shift);
EXPORT_SYMBOL(memnodemap);
EXPORT_SYMBOL(node_data);
EXPORT_SYMBOL(fake_node);
#endif
EXPORT_SYMBOL(clear_page);
...@@ -6,16 +6,13 @@ ...@@ -6,16 +6,13 @@
#define FIX_ALIGNMENT 1 #define FIX_ALIGNMENT 1
#define movnti movq /* write to cache for now */
#define prefetch prefetcht2
#include <asm/current.h> #include <asm/current.h>
#include <asm/offset.h> #include <asm/offset.h>
#include <asm/thread_info.h> #include <asm/thread_info.h>
/* Standard copy_to_user with segment limit checking */ /* Standard copy_to_user with segment limit checking */
.globl copy_to_user .globl copy_to_user
.p2align .p2align 4
copy_to_user: copy_to_user:
GET_THREAD_INFO(%rax) GET_THREAD_INFO(%rax)
movq %rdi,%rcx movq %rdi,%rcx
...@@ -27,7 +24,7 @@ copy_to_user: ...@@ -27,7 +24,7 @@ copy_to_user:
/* Standard copy_from_user with segment limit checking */ /* Standard copy_from_user with segment limit checking */
.globl copy_from_user .globl copy_from_user
.p2align .p2align 4
copy_from_user: copy_from_user:
GET_THREAD_INFO(%rax) GET_THREAD_INFO(%rax)
movq %rsi,%rcx movq %rsi,%rcx
...@@ -58,23 +55,23 @@ bad_to_user: ...@@ -58,23 +55,23 @@ bad_to_user:
* rdx count * rdx count
* *
* Output: * Output:
* eax uncopied bytes or 0 if successful. * eax uncopied bytes or 0 if successfull.
*/ */
.globl copy_user_generic .globl copy_user_generic
.p2align 4
copy_user_generic: copy_user_generic:
/* Put the first cacheline into cache. This should handle /* Put the first cacheline into cache. This should handle
the small movements in ioctls etc., but not penalize the bigger the small movements in ioctls etc., but not penalize the bigger
filesystem data copies too much. */ filesystem data copies too much. */
pushq %rbx pushq %rbx
prefetch (%rsi)
xorl %eax,%eax /*zero for the exception handler */ xorl %eax,%eax /*zero for the exception handler */
#ifdef FIX_ALIGNMENT #ifdef FIX_ALIGNMENT
/* check for bad alignment of destination */ /* check for bad alignment of destination */
movl %edi,%ecx movl %edi,%ecx
andl $7,%ecx andl $7,%ecx
jnz bad_alignment jnz .Lbad_alignment
after_bad_alignment: .Lafter_bad_alignment:
#endif #endif
movq %rdx,%rcx movq %rdx,%rcx
...@@ -82,133 +79,133 @@ after_bad_alignment: ...@@ -82,133 +79,133 @@ after_bad_alignment:
movl $64,%ebx movl $64,%ebx
shrq $6,%rdx shrq $6,%rdx
decq %rdx decq %rdx
js handle_tail js .Lhandle_tail
jz loop_no_prefetch
.p2align 4
loop: .Lloop:
prefetch 64(%rsi) .Ls1: movq (%rsi),%r11
.Ls2: movq 1*8(%rsi),%r8
loop_no_prefetch: .Ls3: movq 2*8(%rsi),%r9
s1: movq (%rsi),%r11 .Ls4: movq 3*8(%rsi),%r10
s2: movq 1*8(%rsi),%r8 .Ld1: movq %r11,(%rdi)
s3: movq 2*8(%rsi),%r9 .Ld2: movq %r8,1*8(%rdi)
s4: movq 3*8(%rsi),%r10 .Ld3: movq %r9,2*8(%rdi)
d1: movnti %r11,(%rdi) .Ld4: movq %r10,3*8(%rdi)
d2: movnti %r8,1*8(%rdi)
d3: movnti %r9,2*8(%rdi) .Ls5: movq 4*8(%rsi),%r11
d4: movnti %r10,3*8(%rdi) .Ls6: movq 5*8(%rsi),%r8
.Ls7: movq 6*8(%rsi),%r9
s5: movq 4*8(%rsi),%r11 .Ls8: movq 7*8(%rsi),%r10
s6: movq 5*8(%rsi),%r8 .Ld5: movq %r11,4*8(%rdi)
s7: movq 6*8(%rsi),%r9 .Ld6: movq %r8,5*8(%rdi)
s8: movq 7*8(%rsi),%r10 .Ld7: movq %r9,6*8(%rdi)
d5: movnti %r11,4*8(%rdi) .Ld8: movq %r10,7*8(%rdi)
d6: movnti %r8,5*8(%rdi)
d7: movnti %r9,6*8(%rdi)
d8: movnti %r10,7*8(%rdi)
addq %rbx,%rsi
addq %rbx,%rdi
decq %rdx decq %rdx
jz loop_no_prefetch
jns loop
handle_tail: leaq 64(%rsi),%rsi
leaq 64(%rdi),%rdi
jns .Lloop
.p2align 4
.Lhandle_tail:
movl %ecx,%edx movl %ecx,%edx
andl $63,%ecx andl $63,%ecx
shrl $3,%ecx shrl $3,%ecx
jz handle_7 jz .Lhandle_7
movl $8,%ebx movl $8,%ebx
loop_8: .p2align 4
s9: movq (%rsi),%r8 .Lloop_8:
d9: movq %r8,(%rdi) .Ls9: movq (%rsi),%r8
addq %rbx,%rdi .Ld9: movq %r8,(%rdi)
addq %rbx,%rsi
decl %ecx decl %ecx
jnz loop_8 leaq 8(%rdi),%rdi
leaq 8(%rsi),%rsi
jnz .Lloop_8
handle_7: .Lhandle_7:
movl %edx,%ecx movl %edx,%ecx
andl $7,%ecx andl $7,%ecx
jz ende jz .Lende
loop_1: .p2align 4
s10: movb (%rsi),%bl .Lloop_1:
d10: movb %bl,(%rdi) .Ls10: movb (%rsi),%bl
.Ld10: movb %bl,(%rdi)
incq %rdi incq %rdi
incq %rsi incq %rsi
decl %ecx decl %ecx
jnz loop_1 jnz .Lloop_1
ende: .Lende:
sfence
popq %rbx popq %rbx
ret ret
#ifdef FIX_ALIGNMENT #ifdef FIX_ALIGNMENT
/* align destination */ /* align destination */
bad_alignment: .p2align 4
.Lbad_alignment:
movl $8,%r9d movl $8,%r9d
subl %ecx,%r9d subl %ecx,%r9d
movl %r9d,%ecx movl %r9d,%ecx
subq %r9,%rdx subq %r9,%rdx
jz small_align jz .Lsmall_align
js small_align js .Lsmall_align
align_1: .Lalign_1:
s11: movb (%rsi),%bl .Ls11: movb (%rsi),%bl
d11: movb %bl,(%rdi) .Ld11: movb %bl,(%rdi)
incq %rsi incq %rsi
incq %rdi incq %rdi
decl %ecx decl %ecx
jnz align_1 jnz .Lalign_1
jmp after_bad_alignment jmp .Lafter_bad_alignment
small_align: .Lsmall_align:
addq %r9,%rdx addq %r9,%rdx
jmp handle_7 jmp .Lhandle_7
#endif #endif
/* table sorted by exception address */ /* table sorted by exception address */
.section __ex_table,"a" .section __ex_table,"a"
.align 8 .align 8
.quad s1,s1e .quad .Ls1,.Ls1e
.quad s2,s2e .quad .Ls2,.Ls2e
.quad s3,s3e .quad .Ls3,.Ls3e
.quad s4,s4e .quad .Ls4,.Ls4e
.quad d1,s1e .quad .Ld1,.Ls1e
.quad d2,s2e .quad .Ld2,.Ls2e
.quad d3,s3e .quad .Ld3,.Ls3e
.quad d4,s4e .quad .Ld4,.Ls4e
.quad s5,s5e .quad .Ls5,.Ls5e
.quad s6,s6e .quad .Ls6,.Ls6e
.quad s7,s7e .quad .Ls7,.Ls7e
.quad s8,s8e .quad .Ls8,.Ls8e
.quad d5,s5e .quad .Ld5,.Ls5e
.quad d6,s6e .quad .Ld6,.Ls6e
.quad d7,s7e .quad .Ld7,.Ls7e
.quad d8,s8e .quad .Ld8,.Ls8e
.quad s9,e_quad .quad .Ls9,.Le_quad
.quad d9,e_quad .quad .Ld9,.Le_quad
.quad s10,e_byte .quad .Ls10,.Le_byte
.quad d10,e_byte .quad .Ld10,.Le_byte
#ifdef FIX_ALIGNMENT #ifdef FIX_ALIGNMENT
.quad s11,e_byte .quad .Ls11,.Le_byte
.quad d11,e_byte .quad .Ld11,.Le_byte
#endif #endif
.quad e5,e_zero .quad .Le5,.Le_zero
.previous .previous
/* compute 64-offset for main loop. 8 bytes accuracy with error on the /* compute 64-offset for main loop. 8 bytes accuracy with error on the
pessimistic side. this is gross. it would be better to fix the pessimistic side. this is gross. it would be better to fix the
interface. */ interface. */
/* eax: zero, ebx: 64 */ /* eax: zero, ebx: 64 */
s1e: addl $8,%eax .Ls1e: addl $8,%eax
s2e: addl $8,%eax .Ls2e: addl $8,%eax
s3e: addl $8,%eax .Ls3e: addl $8,%eax
s4e: addl $8,%eax .Ls4e: addl $8,%eax
s5e: addl $8,%eax .Ls5e: addl $8,%eax
s6e: addl $8,%eax .Ls6e: addl $8,%eax
s7e: addl $8,%eax .Ls7e: addl $8,%eax
s8e: addl $8,%eax .Ls8e: addl $8,%eax
addq %rbx,%rdi /* +64 */ addq %rbx,%rdi /* +64 */
subq %rax,%rdi /* correct destination with computed offset */ subq %rax,%rdi /* correct destination with computed offset */
...@@ -216,22 +213,22 @@ s8e: addl $8,%eax ...@@ -216,22 +213,22 @@ s8e: addl $8,%eax
addq %rax,%rdx /* add offset to loopcnt */ addq %rax,%rdx /* add offset to loopcnt */
andl $63,%ecx /* remaining bytes */ andl $63,%ecx /* remaining bytes */
addq %rcx,%rdx /* add them */ addq %rcx,%rdx /* add them */
jmp zero_rest jmp .Lzero_rest
/* exception on quad word loop in tail handling */ /* exception on quad word loop in tail handling */
/* ecx: loopcnt/8, %edx: length, rdi: correct */ /* ecx: loopcnt/8, %edx: length, rdi: correct */
e_quad: .Le_quad:
shll $3,%ecx shll $3,%ecx
andl $7,%edx andl $7,%edx
addl %ecx,%edx addl %ecx,%edx
/* edx: bytes to zero, rdi: dest, eax:zero */ /* edx: bytes to zero, rdi: dest, eax:zero */
zero_rest: .Lzero_rest:
movq %rdx,%rcx movq %rdx,%rcx
e_byte: .Le_byte:
xorl %eax,%eax xorl %eax,%eax
e5: rep .Le5: rep
stosb stosb
/* when there is another exception while zeroing the rest just return */ /* when there is another exception while zeroing the rest just return */
e_zero: .Le_zero:
movq %rdx,%rax movq %rdx,%rax
jmp ende jmp .Lende
...@@ -226,17 +226,18 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ...@@ -226,17 +226,18 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code)
bad_area_nosemaphore: bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
#ifdef CONFIG_IA32_EMULATION #ifdef CONFIG_IA32_EMULATION
/* 32bit vsyscall. map on demand. */ /* 32bit vsyscall. map on demand. */
if (test_thread_flag(TIF_IA32) && if (test_thread_flag(TIF_IA32) &&
address >= 0xffffe000 && address < 0xffffefff-7) { address >= 0xffffe000 && address < 0xffffe000 + PAGE_SIZE) {
if (map_syscall32(mm, address) < 0) if (map_syscall32(mm, address) < 0)
goto out_of_memory2; goto out_of_memory2;
return; return;
} }
#endif #endif
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
printk(KERN_INFO printk(KERN_INFO
"%s[%d] segfault at rip:%lx rsp:%lx adr:%lx err:%lx\n", "%s[%d] segfault at rip:%lx rsp:%lx adr:%lx err:%lx\n",
tsk->comm, tsk->pid, regs->rip, regs->rsp, address, tsk->comm, tsk->pid, regs->rip, regs->rsp, address,
......
...@@ -39,7 +39,7 @@ ...@@ -39,7 +39,7 @@
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/smp.h> #include <asm/smp.h>
#define Dprintk(x...) printk(x) #define Dprintk(x...)
struct mmu_gather mmu_gathers[NR_CPUS]; struct mmu_gather mmu_gathers[NR_CPUS];
...@@ -105,7 +105,7 @@ static void set_pte_phys(unsigned long vaddr, ...@@ -105,7 +105,7 @@ static void set_pte_phys(unsigned long vaddr,
pml4_t *level4; pml4_t *level4;
pgd_t *pgd; pgd_t *pgd;
pmd_t *pmd; pmd_t *pmd;
pte_t *pte; pte_t *pte, new_pte;
Dprintk("set_pte_phys %lx to %lx\n", vaddr, phys); Dprintk("set_pte_phys %lx to %lx\n", vaddr, phys);
...@@ -132,11 +132,13 @@ static void set_pte_phys(unsigned long vaddr, ...@@ -132,11 +132,13 @@ static void set_pte_phys(unsigned long vaddr,
return; return;
} }
} }
new_pte = pfn_pte(phys >> PAGE_SHIFT, prot);
pte = pte_offset_kernel(pmd, vaddr); pte = pte_offset_kernel(pmd, vaddr);
/* CHECKME: */ if (!pte_none(*pte) &&
if (pte_val(*pte)) pte_val(*pte) != (pte_val(new_pte) & __supported_pte_mask))
pte_ERROR(*pte); pte_ERROR(*pte);
set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, prot)); set_pte(pte, new_pte);
/* /*
* It's enough to flush this one mapping. * It's enough to flush this one mapping.
...@@ -340,6 +342,35 @@ void __init paging_init(void) ...@@ -340,6 +342,35 @@ void __init paging_init(void)
} }
#endif #endif
/* Unmap a kernel mapping if it exists. This is useful to avoid prefetches
from the CPU leading to inconsistent cache lines. address and size
must be aligned to 2MB boundaries.
Does nothing when the mapping doesn't exist. */
void __init clear_kernel_mapping(unsigned long address, unsigned long size)
{
unsigned long end = address + size;
BUG_ON(address & ~LARGE_PAGE_MASK);
BUG_ON(size & ~LARGE_PAGE_MASK);
for (; address < end; address += LARGE_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
if (!pgd || pgd_none(*pgd))
continue;
pmd_t *pmd = pmd_offset(pgd, address);
if (!pmd || pmd_none(*pmd))
continue;
if (0 == (pmd_val(*pmd) & _PAGE_PSE)) {
/* Could handle this, but it should not happen currently. */
printk(KERN_ERR
"clear_kernel_mapping: mapping has been split. will leak memory\n");
pmd_ERROR(*pmd);
}
set_pmd(pmd, __pmd(0));
}
__flush_tlb_all();
}
static inline int page_is_ram (unsigned long pagenr) static inline int page_is_ram (unsigned long pagenr)
{ {
int i; int i;
......
...@@ -87,10 +87,8 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end) ...@@ -87,10 +87,8 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
if (limit > end_pfn_map << PAGE_SHIFT) if (limit > end_pfn_map << PAGE_SHIFT)
limit = end_pfn_map << PAGE_SHIFT; limit = end_pfn_map << PAGE_SHIFT;
if (limit <= base) { if (limit <= base)
printk(KERN_INFO "Node %d beyond memory map\n", nodeid);
continue; continue;
}
base >>= 16; base >>= 16;
base <<= 24; base <<= 24;
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
#include <asm/dma.h> #include <asm/dma.h>
#include <asm/numa.h> #include <asm/numa.h>
#define Dprintk(x...) printk(x) #define Dprintk(x...)
struct pglist_data *node_data[MAXNODE]; struct pglist_data *node_data[MAXNODE];
bootmem_data_t plat_node_bdata[MAX_NUMNODES]; bootmem_data_t plat_node_bdata[MAX_NUMNODES];
...@@ -104,8 +104,11 @@ void __init setup_node_bootmem(int nodeid, unsigned long start, unsigned long en ...@@ -104,8 +104,11 @@ void __init setup_node_bootmem(int nodeid, unsigned long start, unsigned long en
reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size); reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size);
reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT); reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT);
if (nodeid + 1 > numnodes) if (nodeid + 1 > numnodes) {
numnodes = nodeid + 1; numnodes = nodeid + 1;
printk(KERN_INFO
"setup_node_bootmem: enlarging numnodes to %d\n", numnodes);
}
nodes_present |= (1UL << nodeid); nodes_present |= (1UL << nodeid);
} }
...@@ -121,7 +124,7 @@ void __init setup_node_zones(int nodeid) ...@@ -121,7 +124,7 @@ void __init setup_node_zones(int nodeid)
start_pfn = node_start_pfn(nodeid); start_pfn = node_start_pfn(nodeid);
end_pfn = node_end_pfn(nodeid); end_pfn = node_end_pfn(nodeid);
printk("setting up node %d %lx-%lx\n", nodeid, start_pfn, end_pfn); printk(KERN_INFO "setting up node %d %lx-%lx\n", nodeid, start_pfn, end_pfn);
/* All nodes > 0 have a zero length zone DMA */ /* All nodes > 0 have a zero length zone DMA */
dma_end_pfn = __pa(MAX_DMA_ADDRESS) >> PAGE_SHIFT; dma_end_pfn = __pa(MAX_DMA_ADDRESS) >> PAGE_SHIFT;
......
...@@ -63,29 +63,53 @@ static void flush_kernel_map(void *address) ...@@ -63,29 +63,53 @@ static void flush_kernel_map(void *address)
__flush_tlb_one(address); __flush_tlb_one(address);
} }
static inline void flush_map(unsigned long address)
{
on_each_cpu(flush_kernel_map, (void *)address, 1, 1);
}
struct deferred_page {
struct deferred_page *next;
struct page *fpage;
unsigned long address;
};
static struct deferred_page *df_list; /* protected by init_mm.mmap_sem */
static inline void save_page(unsigned long address, struct page *fpage)
{
struct deferred_page *df;
df = kmalloc(sizeof(struct deferred_page), GFP_KERNEL);
if (!df) {
flush_map(address);
__free_page(fpage);
} else {
df->next = df_list;
df->fpage = fpage;
df->address = address;
df_list = df;
}
}
/* /*
* No more special protections in this 2/4MB area - revert to a * No more special protections in this 2/4MB area - revert to a
* large page again. * large page again.
*/ */
static inline void revert_page(struct page *kpte_page, unsigned long address) static void revert_page(struct page *kpte_page, unsigned long address)
{ {
pgd_t *pgd; pgd_t *pgd;
pmd_t *pmd; pmd_t *pmd;
pte_t large_pte; pte_t large_pte;
pgd = pgd_offset_k(address); pgd = pgd_offset_k(address);
if (!pgd) BUG();
pmd = pmd_offset(pgd, address); pmd = pmd_offset(pgd, address);
if (!pmd) BUG(); BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
if ((pmd_val(*pmd) & _PAGE_GLOBAL) == 0) BUG();
large_pte = mk_pte_phys(__pa(address) & LARGE_PAGE_MASK, PAGE_KERNEL_LARGE); large_pte = mk_pte_phys(__pa(address) & LARGE_PAGE_MASK, PAGE_KERNEL_LARGE);
set_pte((pte_t *)pmd, large_pte); set_pte((pte_t *)pmd, large_pte);
} }
static int static int
__change_page_attr(unsigned long address, struct page *page, pgprot_t prot, __change_page_attr(unsigned long address, struct page *page, pgprot_t prot)
struct page **oldpage)
{ {
pte_t *kpte; pte_t *kpte;
struct page *kpte_page; struct page *kpte_page;
...@@ -107,6 +131,7 @@ __change_page_attr(unsigned long address, struct page *page, pgprot_t prot, ...@@ -107,6 +131,7 @@ __change_page_attr(unsigned long address, struct page *page, pgprot_t prot,
struct page *split = split_large_page(address, prot); struct page *split = split_large_page(address, prot);
if (!split) if (!split)
return -ENOMEM; return -ENOMEM;
atomic_inc(&kpte_page->count);
set_pte(kpte,mk_pte(split, PAGE_KERNEL)); set_pte(kpte,mk_pte(split, PAGE_KERNEL));
} }
} else if ((kpte_flags & _PAGE_PSE) == 0) { } else if ((kpte_flags & _PAGE_PSE) == 0) {
...@@ -115,39 +140,12 @@ __change_page_attr(unsigned long address, struct page *page, pgprot_t prot, ...@@ -115,39 +140,12 @@ __change_page_attr(unsigned long address, struct page *page, pgprot_t prot,
} }
if (atomic_read(&kpte_page->count) == 1) { if (atomic_read(&kpte_page->count) == 1) {
*oldpage = kpte_page; save_page(address, kpte_page);
revert_page(kpte_page, address); revert_page(kpte_page, address);
} }
return 0; return 0;
} }
static inline void flush_map(unsigned long address)
{
on_each_cpu(flush_kernel_map, (void *)address, 1, 1);
}
struct deferred_page {
struct deferred_page *next;
struct page *fpage;
unsigned long address;
};
static struct deferred_page *df_list; /* protected by init_mm.mmap_sem */
static inline void save_page(unsigned long address, struct page *fpage)
{
struct deferred_page *df;
df = kmalloc(sizeof(struct deferred_page), GFP_KERNEL);
if (!df) {
flush_map(address);
__free_page(fpage);
} else {
df->next = df_list;
df->fpage = fpage;
df->address = address;
df_list = df;
}
}
/* /*
* Change the page attributes of an page in the linear mapping. * Change the page attributes of an page in the linear mapping.
* *
...@@ -164,24 +162,19 @@ static inline void save_page(unsigned long address, struct page *fpage) ...@@ -164,24 +162,19 @@ static inline void save_page(unsigned long address, struct page *fpage)
int change_page_attr(struct page *page, int numpages, pgprot_t prot) int change_page_attr(struct page *page, int numpages, pgprot_t prot)
{ {
int err = 0; int err = 0;
struct page *fpage, *fpage2;
int i; int i;
down_write(&init_mm.mmap_sem); down_write(&init_mm.mmap_sem);
for (i = 0; i < numpages; i++, page++) { for (i = 0; i < numpages; !err && i++, page++) {
unsigned long address = (unsigned long)page_address(page); unsigned long address = (unsigned long)page_address(page);
fpage = NULL; err = __change_page_attr(address, page, prot);
err = __change_page_attr(address, page, prot, &fpage); if (err)
break;
/* Handle kernel mapping too which aliases part of the lowmem */ /* Handle kernel mapping too which aliases part of the lowmem */
if (!err && page_to_phys(page) < KERNEL_TEXT_SIZE) { if (page_to_phys(page) < KERNEL_TEXT_SIZE) {
unsigned long addr2 = __START_KERNEL_map + page_to_phys(page); unsigned long addr2 = __START_KERNEL_map + page_to_phys(page);
fpage2 = NULL; err = __change_page_attr(addr2, page, prot);
err = __change_page_attr(addr2, page, prot, &fpage2);
if (fpage2)
save_page(addr2, fpage2);
} }
if (fpage)
save_page(address, fpage);
} }
up_write(&init_mm.mmap_sem); up_write(&init_mm.mmap_sem);
return err; return err;
......
...@@ -378,8 +378,9 @@ static struct irq_info *pirq_get_info(struct pci_dev *dev) ...@@ -378,8 +378,9 @@ static struct irq_info *pirq_get_info(struct pci_dev *dev)
return NULL; return NULL;
} }
static void pcibios_test_irq_handler(int irq, void *dev_id, struct pt_regs *regs) static irqreturn_t pcibios_test_irq_handler(int irq, void *dev_id, struct pt_regs *regs)
{ {
return IRQ_NONE;
} }
static int pcibios_lookup_irq(struct pci_dev *dev, int assign) static int pcibios_lookup_irq(struct pci_dev *dev, int assign)
......
...@@ -127,7 +127,7 @@ SECTIONS ...@@ -127,7 +127,7 @@ SECTIONS
/* Sections to be discarded */ /* Sections to be discarded */
/DISCARD/ : { /DISCARD/ : {
*(.exit.data) *(.exit.data)
*(.exit.text) /* *(.exit.text) */
*(.exitcall.exit) *(.exitcall.exit)
*(.eh_frame) *(.eh_frame)
} }
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
#ifdef CONFIG_X86_LOCAL_APIC #ifdef CONFIG_X86_LOCAL_APIC
#define APIC_DEBUG 1 #define APIC_DEBUG 0
#if APIC_DEBUG #if APIC_DEBUG
#define Dprintk(x...) printk(x) #define Dprintk(x...) printk(x)
......
...@@ -84,8 +84,9 @@ ...@@ -84,8 +84,9 @@
movq \offset+72(%rsp),%rax movq \offset+72(%rsp),%rax
.endm .endm
#define REST_SKIP 6*8
.macro SAVE_REST .macro SAVE_REST
subq $6*8,%rsp subq $REST_SKIP,%rsp
movq %rbx,5*8(%rsp) movq %rbx,5*8(%rsp)
movq %rbp,4*8(%rsp) movq %rbp,4*8(%rsp)
movq %r12,3*8(%rsp) movq %r12,3*8(%rsp)
...@@ -94,7 +95,6 @@ ...@@ -94,7 +95,6 @@
movq %r15,(%rsp) movq %r15,(%rsp)
.endm .endm
#define REST_SKIP 6*8
.macro RESTORE_REST .macro RESTORE_REST
movq (%rsp),%r15 movq (%rsp),%r15
movq 1*8(%rsp),%r14 movq 1*8(%rsp),%r14
......
#ifndef _ASM_X86_64_COMPAT_H #ifndef _ASM_X86_64_COMPAT_H
#define _ASM_X86_64_COMPAT_H #define _ASM_X86_64_COMPAT_H
/* /*
* Architecture specific compatibility types * Architecture specific compatibility types
*/ */
#include <linux/types.h> #include <linux/types.h>
#include <linux/sched.h>
#define COMPAT_USER_HZ 100 #define COMPAT_USER_HZ 100
......
...@@ -58,7 +58,7 @@ ...@@ -58,7 +58,7 @@
We can slow the instruction pipeline for instructions coming via the We can slow the instruction pipeline for instructions coming via the
gdt or the ldt if we want to. I am not sure why this is an advantage */ gdt or the ldt if we want to. I am not sure why this is an advantage */
#define DR_CONTROL_RESERVED (0xFFFFFFFFFC00) /* Reserved by Intel */ #define DR_CONTROL_RESERVED (0xFFFFFFFF0000FC00UL) /* Reserved */
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */ #define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */ #define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */
......
...@@ -50,7 +50,7 @@ extern void contig_e820_setup(void); ...@@ -50,7 +50,7 @@ extern void contig_e820_setup(void);
extern unsigned long e820_end_of_ram(void); extern unsigned long e820_end_of_ram(void);
extern void e820_reserve_resources(void); extern void e820_reserve_resources(void);
extern void e820_print_map(char *who); extern void e820_print_map(char *who);
extern int e820_mapped(unsigned long start, unsigned long end, int type); extern int e820_mapped(unsigned long start, unsigned long end, unsigned type);
extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end); extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment