Commit d8f19f2c authored by Andi Kleen's avatar Andi Kleen Committed by Vojtech Pavlik

[PATCH] x86-64 merge

This brings the x86-64 port uptodate in 2.5.60. Unfortunately I cannot
test too much because i constantly get deadlocks in exit/wait in initscripts
on SMP bootup. The kernel seems to still lose a lot of SIGCHLD. 2.5.59/SMP
had the same problem. Uniprocessor and SMP kernel on UP seems to work.

This patch only touches x86-64 specific files. It requires a few simple
changes to arch independent files that I will send separately.

 - Fixed a lot of obsolete/misleading configure help texts.
 - Remove old bootblock disk loader and support fdimage target for syslinux
   instead (H. Peter Anvin)
 - Fix potential fpu signal restore problem on 32bit emulation.
 - Merge with 2.5.60 i386 (hugetlbfs, acpi etc.)
 - Some fixes for local apic disabled modus.
 - Beginngs of S3 ACPI wakeup from real-mode (not working yet, don't use)
 - Beginnings of NUMA/CONFIG_DISCONTIGMEM support for AMD K8 (work in progress,
   port from 2.4): clean up memory mapping at bootup, generalize bootmem etc.
 - Fix 64bit GS base reload problem and reenable (Karsten Keil)
 - Fix race with vmalloc accesses from interrupt handlers disturbing page fault/
   similar race for the debug handler (thanks to Andrew Morton)
 - Merge cpu access primitives with i386
 - Revert to private module list for now because putting modules
   nto vmlist triggered too many problems.
 - Some cleanups, removal of unneeded code.
 - Let early __get_free_pages see consistent pda
 - Preempt disabled for now because it is too broken right now
 - Signal handler fixes
 - Fix do_gettimeofday to be completely lockless and reenable vsyscalls
 - Optimize context switch path a bit (should be ported to i386)
 - Get thread_info via stack for better code
 - Don't leak pmd pages
 - Clean up hardcoded task stack sizes.
parent 3ab054ff
...@@ -19,11 +19,6 @@ config X86_64 ...@@ -19,11 +19,6 @@ config X86_64
config X86 config X86
bool bool
default y default y
help
This is Linux's home port. Linux was originally native to the Intel
386, and runs on all the later x86 processors including the Intel
486, 586, Pentiums, and various instruction-set-compatible chips by
AMD, Cyrix, and others.
config MMU config MMU
bool bool
...@@ -35,20 +30,10 @@ config SWAP ...@@ -35,20 +30,10 @@ config SWAP
config ISA config ISA
bool bool
help
Find out whether you have ISA slots on your motherboard. ISA is the
name of a bus system, i.e. the way the CPU talks to the other stuff
inside your box. Other bus systems are PCI, EISA, MicroChannel
(MCA) or VESA. ISA is an older system, now being displaced by PCI;
newer boards don't support it. If you have ISA, say Y, otherwise N.
config SBUS config SBUS
bool bool
config UID16
bool
default y
config RWSEM_GENERIC_SPINLOCK config RWSEM_GENERIC_SPINLOCK
bool bool
default y default y
...@@ -86,14 +71,14 @@ choice ...@@ -86,14 +71,14 @@ choice
default MK8 default MK8
config MK8 config MK8
bool "AMD-Hammer" bool "AMD-Opteron/Athlon64"
help help
Support for AMD Clawhammer/Sledgehammer CPUs. Only choice for x86-64 Optimize for AMD Opteron/Athlon64/Hammer/K8 CPUs.
currently so you should choose this if you want a x86-64 kernel. In fact
you will have no other choice than to choose this.
config GENERIC_CPU config GENERIC_CPU
bool "Generic-x86-64" bool "Generic-x86-64"
help
Generic x86-64 CPU.
endchoice endchoice
...@@ -196,25 +181,12 @@ config SMP ...@@ -196,25 +181,12 @@ config SMP
singleprocessor machines. On a singleprocessor machine, the kernel singleprocessor machines. On a singleprocessor machine, the kernel
will run faster if you say N here. will run faster if you say N here.
Note that if you say Y here and choose architecture "586" or
"Pentium" under "Processor family", the kernel will not work on 486
architectures. Similarly, multiprocessor kernels for the "PPro"
architecture may not work on all Pentium based boards.
People using multiprocessor machines who say Y here should also say
Y to "Enhanced Real Time Clock Support", below. The "Advanced Power
Management" code will be disabled if you say Y here.
See also the <file:Documentation/smp.tex>,
<file:Documentation/smp.txt>, <file:Documentation/i386/IO-APIC.txt>,
<file:Documentation/nmi_watchdog.txt> and the SMP-HOWTO available at
<http://www.tldp.org/docs.html#howto>.
If you don't know what to do here, say N. If you don't know what to do here, say N.
# broken currently
config PREEMPT config PREEMPT
depends on NOT_WORKING
bool "Preemptible Kernel" bool "Preemptible Kernel"
depends on !SMP
---help--- ---help---
This option reduces the latency of the kernel when reacting to This option reduces the latency of the kernel when reacting to
real-time or interactive events by allowing a low priority process to real-time or interactive events by allowing a low priority process to
...@@ -229,6 +201,28 @@ config PREEMPT ...@@ -229,6 +201,28 @@ config PREEMPT
Say Y here if you are feeling brave and building a kernel for a Say Y here if you are feeling brave and building a kernel for a
desktop, embedded or real-time system. Say N if you are unsure. desktop, embedded or real-time system. Say N if you are unsure.
# someone write a better help text please.
config K8_NUMA
bool "K8 NUMA support"
depends on SMP
help
Enable NUMA (Non Unified Memory Architecture) support for
AMD Opteron Multiprocessor systems. The kernel will try to allocate
memory used by a CPU on the local memory controller of the CPU
and in the future do more optimizations. This may improve performance
or it may not. Code is still experimental.
Say N if unsure.
config DISCONTIGMEM
bool
depends on K8_NUMA
default y
config NUMA
bool
depends on K8_NUMA
default y
config HAVE_DEC_LOCK config HAVE_DEC_LOCK
bool bool
depends on SMP depends on SMP
...@@ -245,15 +239,17 @@ config NR_CPUS ...@@ -245,15 +239,17 @@ config NR_CPUS
kernel will support. The maximum supported value is 32 and the kernel will support. The maximum supported value is 32 and the
minimum value which makes sense is 2. minimum value which makes sense is 2.
This is purely to save memory - each supported CPU adds This is purely to save memory - each supported CPU requires
approximately eight kilobytes to the kernel image. memory in the static kernel configuration.
config GART_IOMMU config GART_IOMMU
bool "IOMMU support" bool "IOMMU support"
help help
Support the K8 IOMMU. Needed to run systems with more than 4GB of memory Support the K8 IOMMU. Needed to run systems with more than 4GB of memory
properly with 32-bit devices. You should probably turn this on. properly with 32-bit PCI devices that do not support DAC (Double Address
The iommu can be turned off at runtime with the iommu=off parameter. Cycle). The IOMMU can be turned off at runtime with the iommu=off parameter.
Normally the kernel will take the right choice by itself.
If unsure say Y
config DUMMY_IOMMU config DUMMY_IOMMU
bool bool
...@@ -291,7 +287,8 @@ config PM ...@@ -291,7 +287,8 @@ config PM
Note that, even if you say N here, Linux on the x86 architecture Note that, even if you say N here, Linux on the x86 architecture
will issue the hlt instruction if nothing is to be done, thereby will issue the hlt instruction if nothing is to be done, thereby
sending the processor to sleep and saving power. sending the processor to limited sleep and saving power. However
using ACPI will likely save more power.
config SOFTWARE_SUSPEND config SOFTWARE_SUSPEND
bool "Software Suspend (EXPERIMENTAL)" bool "Software Suspend (EXPERIMENTAL)"
...@@ -331,16 +328,6 @@ menu "Bus options (PCI etc.)" ...@@ -331,16 +328,6 @@ menu "Bus options (PCI etc.)"
config PCI config PCI
bool "PCI support" bool "PCI support"
help
Find out whether you have a PCI motherboard. PCI is the name of a
bus system, i.e. the way the CPU talks to the other stuff inside
your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or
VESA. If you have PCI, say Y, otherwise N.
The PCI-HOWTO, available from
<http://www.tldp.org/docs.html#howto>, contains valuable
information about which PCI hardware does work under Linux and which
doesn't.
# x86-64 doesn't support PCI BIOS access from long mode so always go direct. # x86-64 doesn't support PCI BIOS access from long mode so always go direct.
config PCI_DIRECT config PCI_DIRECT
...@@ -381,54 +368,10 @@ config KCORE_ELF ...@@ -381,54 +368,10 @@ config KCORE_ELF
bool bool
depends on PROC_FS depends on PROC_FS
default y default y
---help---
If you enabled support for /proc file system then the file
/proc/kcore will contain the kernel core image. This can be used
in gdb:
$ cd /usr/src/linux ; gdb vmlinux /proc/kcore
You have two choices here: ELF and A.OUT. Selecting ELF will make
/proc/kcore appear in ELF core format as defined by the Executable
and Linkable Format specification. Selecting A.OUT will choose the
old "a.out" format which may be necessary for some old versions
of binutils or on some architectures.
This is especially useful if you have compiled the kernel with the
"-g" option to preserve debugging information. It is mainly used
for examining kernel data structures on the live kernel so if you
don't understand what this means or are not a kernel hacker, just
leave it at its default value ELF.
#tristate 'Kernel support for a.out binaries' CONFIG_BINFMT_AOUT
config BINFMT_ELF config BINFMT_ELF
tristate "Kernel support for ELF binaries" bool
---help--- default y
ELF (Executable and Linkable Format) is a format for libraries and
executables used across different architectures and operating
systems. Saying Y here will enable your kernel to run ELF binaries
and enlarge it by about 13 KB. ELF support under Linux has now all
but replaced the traditional Linux a.out formats (QMAGIC and ZMAGIC)
because it is portable (this does *not* mean that you will be able
to run executables from different architectures or operating systems
however) and makes building run-time libraries very easy. Many new
executables are distributed solely in ELF format. You definitely
want to say Y here.
Information about ELF is contained in the ELF HOWTO available from
<http://www.tldp.org/docs.html#howto>.
If you find that after upgrading from Linux kernel 1.2 and saying Y
here, you still can't run any ELF binaries (they just crash), then
you'll have to install the newest ELF runtime libraries, including
ld.so (check the file <file:Documentation/Changes> for location and
latest version).
If you want to compile this as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
say M here and read <file:Documentation/modules.txt>. The module
will be called binfmt_elf. Saying M or N here is dangerous because
some crucial programs on your system might be in ELF format.
config BINFMT_MISC config BINFMT_MISC
tristate "Kernel support for MISC binaries" tristate "Kernel support for MISC binaries"
...@@ -436,12 +379,9 @@ config BINFMT_MISC ...@@ -436,12 +379,9 @@ config BINFMT_MISC
If you say Y here, it will be possible to plug wrapper-driven binary If you say Y here, it will be possible to plug wrapper-driven binary
formats into the kernel. You will like this especially when you use formats into the kernel. You will like this especially when you use
programs that need an interpreter to run like Java, Python or programs that need an interpreter to run like Java, Python or
Emacs-Lisp. It's also useful if you often run DOS executables under Emacs-Lisp. Once you have registered such a binary class with the kernel,
the Linux DOS emulator DOSEMU (read the DOSEMU-HOWTO, available from you can start one of those programs simply by typing in its name at a shell
<http://www.tldp.org/docs.html#howto>). Once you have prompt; Linux will automatically feed it to the correct interpreter.
registered such a binary class with the kernel, you can start one of
those programs simply by typing in its name at a shell prompt; Linux
will automatically feed it to the correct interpreter.
You can do other nice things, too. Read the file You can do other nice things, too. Read the file
<file:Documentation/binfmt_misc.txt> to learn how to use this <file:Documentation/binfmt_misc.txt> to learn how to use this
...@@ -467,6 +407,12 @@ config COMPAT ...@@ -467,6 +407,12 @@ config COMPAT
depends on IA32_EMULATION depends on IA32_EMULATION
default y default y
config UID16
bool
depends on IA32_EMULATION
default y
endmenu endmenu
source "drivers/mtd/Kconfig" source "drivers/mtd/Kconfig"
...@@ -672,9 +618,11 @@ config DEBUG_SPINLOCK ...@@ -672,9 +618,11 @@ config DEBUG_SPINLOCK
best used in conjunction with the NMI watchdog so that spinlock best used in conjunction with the NMI watchdog so that spinlock
deadlocks are also debuggable. deadlocks are also debuggable.
# !SMP for now because the context switch early causes GPF in segment reloading
# and the GS base checking does the wrong thing then, causing a hang.
config CHECKING config CHECKING
bool "Additional run-time checks" bool "Additional run-time checks"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL && !SMP
help help
Enables some internal consistency checks for kernel debugging. Enables some internal consistency checks for kernel debugging.
You should normally say N. You should normally say N.
...@@ -683,7 +631,8 @@ config INIT_DEBUG ...@@ -683,7 +631,8 @@ config INIT_DEBUG
bool "Debug __init statements" bool "Debug __init statements"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL
help help
Fill __init and __initdata at the end of boot. This is only for debugging. Fill __init and __initdata at the end of boot. This helps debugging
illegal uses of __init and __initdata after initialization.
config KALLSYMS config KALLSYMS
bool "Load all symbols for debugging/kksymoops" bool "Load all symbols for debugging/kksymoops"
...@@ -696,11 +645,11 @@ config FRAME_POINTER ...@@ -696,11 +645,11 @@ config FRAME_POINTER
bool "Compile the kernel with frame pointers" bool "Compile the kernel with frame pointers"
depends on DEBUG_KERNEL depends on DEBUG_KERNEL
help help
If you say Y here the resulting kernel image will be slightly larger Compile the kernel with frame pointers. This may help for some
and slower, but it will give very useful debugging information. debugging with external debuggers. Note the standard oops backtracer
If you don't debug the kernel, you can say N, but we may not be able doesn't make use of it and the x86-64 kernel doesn't ensure an consistent
to solve problems without frame pointers. frame pointer through inline assembly (semaphores etc.)
Note this is normally not needed on x86-64. Normally you should say N.
endmenu endmenu
......
...@@ -58,7 +58,8 @@ drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/ ...@@ -58,7 +58,8 @@ drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/
boot := arch/x86_64/boot boot := arch/x86_64/boot
.PHONY: bzImage bzlilo bzdisk install archmrproper .PHONY: bzImage bzlilo install archmrproper \
fdimage fdimage144 fdimage288 archclean
#Default target when executing "make" #Default target when executing "make"
all: bzImage all: bzImage
...@@ -74,7 +75,7 @@ bzlilo: vmlinux ...@@ -74,7 +75,7 @@ bzlilo: vmlinux
bzdisk: vmlinux bzdisk: vmlinux
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) zdisk $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) zdisk
install: vmlinux install fdimage fdimage144 fdimage288: vmlinux
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@ $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@
archclean: archclean:
...@@ -103,3 +104,6 @@ define archhelp ...@@ -103,3 +104,6 @@ define archhelp
echo ' install to $$(INSTALL_PATH) and run lilo' echo ' install to $$(INSTALL_PATH) and run lilo'
endef endef
CLEAN_FILES += arch/$(ARCH)/boot/fdimage arch/$(ARCH)/boot/mtools.conf
...@@ -59,8 +59,36 @@ $(obj)/setup $(obj)/bootsect: %: %.o FORCE ...@@ -59,8 +59,36 @@ $(obj)/setup $(obj)/bootsect: %: %.o FORCE
$(obj)/compressed/vmlinux: FORCE $(obj)/compressed/vmlinux: FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@ $(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@
zdisk: $(BOOTIMAGE) # Set this if you want to pass append arguments to the zdisk/fdimage kernel
dd bs=8192 if=$(BOOTIMAGE) of=/dev/fd0 FDARGS =
$(obj)/mtools.conf: $(obj)/mtools.conf.in
sed -e 's|@OBJ@|$(obj)|g' < $< > $@
# This requires write access to /dev/fd0
zdisk: $(BOOTIMAGE) $(obj)/mtools.conf
MTOOLSRC=$(src)/mtools.conf mformat a: ; sync
syslinux /dev/fd0 ; sync
echo 'default linux $(FDARGS)' | \
MTOOLSRC=$(src)/mtools.conf mcopy - a:syslinux.cfg
MTOOLSRC=$(src)/mtools.conf mcopy $(BOOTIMAGE) a:linux ; sync
# These require being root or having syslinux run setuid
fdimage fdimage144: $(BOOTIMAGE) $(src)/mtools.conf
dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=1440
MTOOLSRC=$(src)/mtools.conf mformat v: ; sync
syslinux $(obj)/fdimage ; sync
echo 'default linux $(FDARGS)' | \
MTOOLSRC=$(src)/mtools.conf mcopy - v:syslinux.cfg
MTOOLSRC=$(src)/mtools.conf mcopy $(BOOTIMAGE) v:linux ; sync
fdimage288: $(BOOTIMAGE) $(src)/mtools.conf
dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=2880
MTOOLSRC=$(src)/mtools.conf mformat w: ; sync
syslinux $(obj)/fdimage ; sync
echo 'default linux $(FDARGS)' | \
MTOOLSRC=$(src)/mtools.conf mcopy - w:syslinux.cfg
MTOOLSRC=$(src)/mtools.conf mcopy $(BOOTIMAGE) w:linux ; sync
zlilo: $(BOOTIMAGE) zlilo: $(BOOTIMAGE)
if [ -f $(INSTALL_PATH)/vmlinuz ]; then mv $(INSTALL_PATH)/vmlinuz $(INSTALL_PATH)/vmlinuz.old; fi if [ -f $(INSTALL_PATH)/vmlinuz ]; then mv $(INSTALL_PATH)/vmlinuz $(INSTALL_PATH)/vmlinuz.old; fi
......
...@@ -4,29 +4,13 @@ ...@@ -4,29 +4,13 @@
* modified by Drew Eckhardt * modified by Drew Eckhardt
* modified by Bruce Evans (bde) * modified by Bruce Evans (bde)
* modified by Chris Noe (May 1999) (as86 -> gas) * modified by Chris Noe (May 1999) (as86 -> gas)
* * gutted by H. Peter Anvin (Jan 2003)
* 360k/720k disk support: Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
* *
* BIG FAT NOTE: We're in real mode using 64k segments. Therefore segment * BIG FAT NOTE: We're in real mode using 64k segments. Therefore segment
* addresses must be multiplied by 16 to obtain their respective linear * addresses must be multiplied by 16 to obtain their respective linear
* addresses. To avoid confusion, linear addresses are written using leading * addresses. To avoid confusion, linear addresses are written using leading
* hex while segment addresses are written as segment:offset. * hex while segment addresses are written as segment:offset.
* *
* bde - should not jump blindly, there may be systems with only 512K low
* memory. Use int 0x12 to get the top of memory, etc.
*
* It then loads 'setup' directly after itself (0x90200), and the system
* at 0x10000, using BIOS interrupts.
*
* NOTE! currently system is at most (8*65536-4096) bytes long. This should
* be no problem, even in the future. I want to keep it simple. This 508 kB
* kernel size should be enough, especially as this doesn't contain the
* buffer cache as in minix (and especially now that the kernel is
* compressed :-)
*
* The loader has been made as simple as possible, and continuous
* read errors will result in a unbreakable loop. Reboot by hand. It
* loads pretty fast by getting whole tracks at a time whenever possible.
*/ */
#include <asm/boot.h> #include <asm/boot.h>
...@@ -59,353 +43,51 @@ SWAP_DEV = 0 /* SWAP_DEV is now written by "build" */ ...@@ -59,353 +43,51 @@ SWAP_DEV = 0 /* SWAP_DEV is now written by "build" */
.global _start .global _start
_start: _start:
# First things first. Move ourself from 0x7C00 -> 0x90000 and jump there. # Normalize the start address
jmpl $BOOTSEG, $start2
movw $BOOTSEG, %ax
movw %ax, %ds # %ds = BOOTSEG
movw $INITSEG, %ax
movw %ax, %es # %ax = %es = INITSEG
movw $256, %cx
subw %si, %si
subw %di, %di
cld
rep
movsw
ljmp $INITSEG, $go
# bde - changed 0xff00 to 0x4000 to use debugger at 0x6400 up (bde). We
# wouldn't have to worry about this if we checked the top of memory. Also
# my BIOS can be configured to put the wini drive tables in high memory
# instead of in the vector table. The old stack might have clobbered the
# drive table.
go: movw $0x4000-12, %di # 0x4000 is an arbitrary value >= start2:
# length of bootsect + length of movw %cs, %ax
# setup + room for stack; movw %ax, %ds
# 12 is disk parm size. movw %ax, %es
movw %ax, %ds # %ax and %es already contain INITSEG
movw %ax, %ss movw %ax, %ss
movw %di, %sp # put stack at INITSEG:0x4000-12. movw $0x7c00, %sp
sti
# Many BIOS's default disk parameter tables will not recognize cld
# multi-sector reads beyond the maximum sector number specified
# in the default diskette parameter tables - this may mean 7
# sectors in some cases.
#
# Since single sector reads are slow and out of the question,
# we must take care of this by creating new parameter tables
# (for the first disk) in RAM. We will set the maximum sector
# count to 36 - the most we will encounter on an ED 2.88.
#
# High doesn't hurt. Low does.
#
# Segments are as follows: %cs = %ds = %es = %ss = INITSEG, %fs = 0,
# and %gs is unused.
movw %cx, %fs # %fs = 0
movw $0x78, %bx # %fs:%bx is parameter table address
pushw %ds
ldsw %fs:(%bx), %si # %ds:%si is source
movb $6, %cl # copy 12 bytes
pushw %di # %di = 0x4000-12.
rep # don't worry about cld
movsw # already done above
popw %di
popw %ds
movb $36, 0x4(%di) # patch sector count
movw %di, %fs:(%bx)
movw %es, %fs:2(%bx)
# Get disk drive parameters, specifically number of sectors/track.
# It seems that there is no BIOS call to get the number of sectors. movw $bugger_off_msg, %si
# Guess 36 sectors if sector 36 can be read, 18 sectors if sector 18
# can be read, 15 if sector 15 can be read. Otherwise guess 9.
# Note that %cx = 0 from rep movsw above.
movw $disksizes, %si # table of sizes to try msg_loop:
probe_loop:
lodsb lodsb
cbtw # extend to word andb %al, %al
movw %ax, sectors jz die
cmpw $disksizes+4, %si movb $0xe, %ah
jae got_sectors # If all else fails, try 9
xchgw %cx, %ax # %cx = track and sector
xorw %dx, %dx # drive 0, head 0
movw $0x0200, %bx # address = 512, in INITSEG (%es = %cs)
movw $0x0201, %ax # service 2, 1 sector
int $0x13
jc probe_loop # try next value
got_sectors:
movb $0x03, %ah # read cursor pos
xorb %bh, %bh
int $0x10
movw $9, %cx
movb $0x07, %bl # page 0, attribute 7 (normal)
# %bh is set above; int10 doesn't
# modify it
movw $msg1, %bp
movw $0x1301, %ax # write string, move cursor
int $0x10 # tell the user we're loading..
# Load the setup-sectors directly after the moved bootblock (at 0x90200).
# We should know the drive geometry to do it, as setup may exceed first
# cylinder (for 9-sector 360K and 720K floppies).
movw $0x0001, %ax # set sread (sector-to-read) to 1 as
movw $sread, %si # the boot sector has already been read
movw %ax, (%si)
call kill_motor # reset FDC
movw $0x0200, %bx # address = 512, in INITSEG
next_step:
movb setup_sects, %al
movw sectors, %cx
subw (%si), %cx # (%si) = sread
cmpb %cl, %al
jbe no_cyl_crossing
movw sectors, %ax
subw (%si), %ax # (%si) = sread
no_cyl_crossing:
call read_track
pushw %ax # save it
call set_next # set %bx properly; it uses %ax,%cx,%dx
popw %ax # restore
subb %al, setup_sects # rest - for next step
jnz next_step
pushw $SYSSEG
popw %es # %es = SYSSEG
call read_it
call kill_motor
call print_nl
# After that we check which root-device to use. If the device is
# defined (!= 0), nothing is done and the given device is used.
# Otherwise, one of /dev/fd0H2880 (2,32) or /dev/PS0 (2,28) or /dev/at0 (2,8)
# depending on the number of sectors we pretend to know we have.
# Segments are as follows: %cs = %ds = %ss = INITSEG,
# %es = SYSSEG, %fs = 0, %gs is unused.
movw root_dev, %ax
orw %ax, %ax
jne root_defined
movw sectors, %bx
movw $0x0208, %ax # /dev/ps0 - 1.2Mb
cmpw $15, %bx
je root_defined
movb $0x1c, %al # /dev/PS0 - 1.44Mb
cmpw $18, %bx
je root_defined
movb $0x20, %al # /dev/fd0H2880 - 2.88Mb
cmpw $36, %bx
je root_defined
movb $0, %al # /dev/fd0 - autodetect
root_defined:
movw %ax, root_dev
# After that (everything loaded), we jump to the setup-routine
# loaded directly after the bootblock:
ljmp $SETUPSEG, $0
# These variables are addressed via %si register as it gives shorter code.
sread: .word 0 # sectors read of current track
head: .word 0 # current head
track: .word 0 # current track
# This routine loads the system at address SYSSEG, making sure
# no 64kB boundaries are crossed. We try to load it as fast as
# possible, loading whole tracks whenever we can.
read_it:
movw %es, %ax # %es = SYSSEG when called
testw $0x0fff, %ax
die: jne die # %es must be at 64kB boundary
xorw %bx, %bx # %bx is starting address within segment
rp_read:
#ifdef __BIG_KERNEL__ # look in setup.S for bootsect_kludge
bootsect_kludge = 0x220 # 0x200 + 0x20 which is the size of the
lcall *bootsect_kludge # bootsector + bootsect_kludge offset
#else
movw %es, %ax
subw $SYSSEG, %ax
movw %bx, %cx
shr $4, %cx
add %cx, %ax # check offset
#endif
cmpw syssize, %ax # have we loaded everything yet?
jbe ok1_read
ret
ok1_read:
movw sectors, %ax
subw (%si), %ax # (%si) = sread
movw %ax, %cx
shlw $9, %cx
addw %bx, %cx
jnc ok2_read
je ok2_read
xorw %ax, %ax
subw %bx, %ax
shrw $9, %ax
ok2_read:
call read_track
call set_next
jmp rp_read
read_track:
pusha
pusha
movw $0xe2e, %ax # loading... message 2e = .
movw $7, %bx movw $7, %bx
int $0x10 int $0x10
popa jmp msg_loop
# Accessing head, track, sread via %si gives shorter code.
movw 4(%si), %dx # 4(%si) = track
movw (%si), %cx # (%si) = sread
incw %cx
movb %dl, %ch
movw 2(%si), %dx # 2(%si) = head
movb %dl, %dh
andw $0x0100, %dx
movb $2, %ah
pushw %dx # save for error dump
pushw %cx
pushw %bx
pushw %ax
int $0x13
jc bad_rt
addw $8, %sp
popa
ret
set_next: die:
movw %ax, %cx # Allow the user to press a key, then reboot
addw (%si), %ax # (%si) = sread
cmp sectors, %ax
jne ok3_set
movw $0x0001, %ax
xorw %ax, 2(%si) # change head
jne ok4_set
incw 4(%si) # next track
ok4_set:
xorw %ax, %ax xorw %ax, %ax
ok3_set: int $0x16
movw %ax, (%si) # set sread int $0x19
shlw $9, %cx
addw %cx, %bx
jnc set_next_fin
movw %es, %ax
addb $0x10, %ah
movw %ax, %es
xorw %bx, %bx
set_next_fin:
ret
bad_rt:
pushw %ax # save error code
call print_all # %ah = error, %al = read
xorb %ah, %ah
xorb %dl, %dl
int $0x13
addw $10, %sp
popa
jmp read_track
# print_all is for debugging purposes.
#
# it will print out all of the registers. The assumption is that this is
# called from a routine, with a stack frame like
#
# %dx
# %cx
# %bx
# %ax
# (error)
# ret <- %sp
print_all:
movw $5, %cx # error code + 4 registers
movw %sp, %bp
print_loop:
pushw %cx # save count remaining
call print_nl # <-- for readability
cmpb $5, %cl
jae no_reg # see if register name is needed
movw $0xe05 + 'A' - 1, %ax # int 0x19 should never return. In case it does anyway,
subb %cl, %al # invoke the BIOS reset code...
int $0x10 ljmp $0xf000,$0xfff0
movb $'X', %al
int $0x10
movb $':', %al
int $0x10
no_reg:
addw $2, %bp # next register
call print_hex # print it
popw %cx
loop print_loop
ret
print_nl:
movw $0xe0d, %ax # CR
int $0x10
movb $0xa, %al # LF
int $0x10
ret
# print_hex is for debugging purposes, and prints the word
# pointed to by %ss:%bp in hexadecimal.
print_hex:
movw $4, %cx # 4 hex digits
movw (%bp), %dx # load word into %dx
print_digit:
rolw $4, %dx # rotate to use low 4 bits
movw $0xe0f, %ax # %ah = request
andb %dl, %al # %al = mask for nybble
addb $0x90, %al # convert %al to ascii hex
daa # in only four instructions!
adc $0x40, %al
daa
int $0x10
loop print_digit
ret
# This procedure turns off the floppy drive motor, so
# that we enter the kernel in a known state, and
# don't have to worry about it later.
# NOTE: Doesn't save %ax or %dx; do it yourself if you need to.
kill_motor: bugger_off_msg:
movw $0x3f2, %dx .ascii "Direct booting from floppy is no longer supported.\r\n"
xorb %al, %al .ascii "Please use a boot loader program instead.\r\n"
outb %al, %dx .ascii "\n"
ret .ascii "Remove disk and press any key to reboot . . .\r\n"
.byte 0
sectors: .word 0
disksizes: .byte 36, 18, 15, 9
msg1: .byte 13, 10
.ascii "Loading"
# XXX: This is a fairly snug fit. # Kernel attributes; used by setup
.org 497 .org 497
setup_sects: .byte SETUPSECTS setup_sects: .byte SETUPSECTS
root_flags: .word ROOT_RDONLY root_flags: .word ROOT_RDONLY
syssize: .word SYSSIZE syssize: .word SYSSIZE
......
#
# mtools configuration file for "make (b)zdisk"
#
# Actual floppy drive
drive a:
file="/dev/fd0"
# 1.44 MB floppy disk image
drive v:
file="@OBJ@/fdimage" cylinders=80 heads=2 sectors=18 filter
# 2.88 MB floppy disk image (mostly for virtual uses)
drive w:
file="@OBJ@/fdimage" cylinders=80 heads=2 sectors=36 filter
...@@ -150,13 +150,10 @@ int main(int argc, char ** argv) ...@@ -150,13 +150,10 @@ int main(int argc, char ** argv)
sz = sb.st_size; sz = sb.st_size;
fprintf (stderr, "System is %d kB\n", sz/1024); fprintf (stderr, "System is %d kB\n", sz/1024);
sys_size = (sz + 15) / 16; sys_size = (sz + 15) / 16;
/* 0x28000*16 = 2.5 MB, conservative estimate for the current maximum */ /* 0x40000*16 = 4.0 MB, reasonable estimate for the current maximum */
if (sys_size > (is_big_kernel ? 0x28000 : DEF_SYSSIZE)) if (sys_size > (is_big_kernel ? 0x40000 : DEF_SYSSIZE))
die("System is too big. Try using %smodules.", die("System is too big. Try using %smodules.",
is_big_kernel ? "" : "bzImage or "); is_big_kernel ? "" : "bzImage or ");
if (sys_size > 0xefff)
fprintf(stderr,"warning: kernel is too big for standalone boot "
"from floppy\n");
while (sz > 0) { while (sz > 0) {
int l, n; int l, n;
......
...@@ -5,7 +5,6 @@ CONFIG_X86_64=y ...@@ -5,7 +5,6 @@ CONFIG_X86_64=y
CONFIG_X86=y CONFIG_X86=y
CONFIG_MMU=y CONFIG_MMU=y
CONFIG_SWAP=y CONFIG_SWAP=y
CONFIG_UID16=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_X86_CMPXCHG=y CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y CONFIG_EARLY_PRINTK=y
...@@ -22,12 +21,6 @@ CONFIG_EXPERIMENTAL=y ...@@ -22,12 +21,6 @@ CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y CONFIG_SYSCTL=y
# CONFIG_LOG_BUF_SHIFT_17 is not set
CONFIG_LOG_BUF_SHIFT_16=y
# CONFIG_LOG_BUF_SHIFT_15 is not set
# CONFIG_LOG_BUF_SHIFT_14 is not set
# CONFIG_LOG_BUF_SHIFT_13 is not set
# CONFIG_LOG_BUF_SHIFT_12 is not set
CONFIG_LOG_BUF_SHIFT=16 CONFIG_LOG_BUF_SHIFT=16
# #
...@@ -37,6 +30,7 @@ CONFIG_MODULES=y ...@@ -37,6 +30,7 @@ CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_KMOD is not set # CONFIG_KMOD is not set
# #
...@@ -103,6 +97,7 @@ CONFIG_BINFMT_ELF=y ...@@ -103,6 +97,7 @@ CONFIG_BINFMT_ELF=y
# CONFIG_BINFMT_MISC is not set # CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y CONFIG_IA32_EMULATION=y
CONFIG_COMPAT=y CONFIG_COMPAT=y
CONFIG_UID16=y
# #
# Memory Technology Devices (MTD) # Memory Technology Devices (MTD)
...@@ -290,6 +285,7 @@ CONFIG_NETDEVICES=y ...@@ -290,6 +285,7 @@ CONFIG_NETDEVICES=y
# Ethernet (10 or 100Mbit) # Ethernet (10 or 100Mbit)
# #
CONFIG_NET_ETHERNET=y CONFIG_NET_ETHERNET=y
# CONFIG_MII is not set
# CONFIG_HAPPYMEAL is not set # CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set # CONFIG_SUNGEM is not set
# CONFIG_NET_VENDOR_3COM is not set # CONFIG_NET_VENDOR_3COM is not set
...@@ -490,6 +486,7 @@ CONFIG_RTC=y ...@@ -490,6 +486,7 @@ CONFIG_RTC=y
# CONFIG_DRM is not set # CONFIG_DRM is not set
# CONFIG_MWAVE is not set # CONFIG_MWAVE is not set
CONFIG_RAW_DRIVER=y CONFIG_RAW_DRIVER=y
# CONFIG_HANGCHECK_TIMER is not set
# #
# Misc devices # Misc devices
...@@ -615,7 +612,6 @@ CONFIG_DEBUG_KERNEL=y ...@@ -615,7 +612,6 @@ CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SLAB is not set
CONFIG_MAGIC_SYSRQ=y CONFIG_MAGIC_SYSRQ=y
# CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_SPINLOCK is not set
CONFIG_CHECKING=y
# CONFIG_INIT_DEBUG is not set # CONFIG_INIT_DEBUG is not set
CONFIG_KALLSYMS=y CONFIG_KALLSYMS=y
# CONFIG_FRAME_POINTER is not set # CONFIG_FRAME_POINTER is not set
......
...@@ -146,6 +146,7 @@ int restore_i387_ia32(struct task_struct *tsk, struct _fpstate_ia32 *buf, int fs ...@@ -146,6 +146,7 @@ int restore_i387_ia32(struct task_struct *tsk, struct _fpstate_ia32 *buf, int fs
return -1; return -1;
} }
tsk->thread.i387.fxsave.mxcsr &= 0xffbf; tsk->thread.i387.fxsave.mxcsr &= 0xffbf;
current->used_math = 1;
return convert_fxsr_from_user(&tsk->thread.i387.fxsave, buf); return convert_fxsr_from_user(&tsk->thread.i387.fxsave, buf);
} }
......
...@@ -450,7 +450,7 @@ ia32_sys_call_table: ...@@ -450,7 +450,7 @@ ia32_sys_call_table:
.quad sys32_io_getevents .quad sys32_io_getevents
.quad sys32_io_submit .quad sys32_io_submit
.quad sys_io_cancel .quad sys_io_cancel
.quad sys_ni_syscall /* 250 alloc_huge_pages */ .quad sys_fadvise64
.quad sys_ni_syscall /* free_huge_pages */ .quad sys_ni_syscall /* free_huge_pages */
.quad sys_exit_group /* exit_group */ .quad sys_exit_group /* exit_group */
.quad sys_lookup_dcookie .quad sys_lookup_dcookie
......
...@@ -17,7 +17,7 @@ obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o ...@@ -17,7 +17,7 @@ obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o
obj-$(CONFIG_X86_IO_APIC) += io_apic.o mpparse.o obj-$(CONFIG_X86_IO_APIC) += io_apic.o mpparse.o
obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o suspend_asm.o obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o suspend_asm.o
obj-$(CONFIG_ACPI) += acpi.o obj-$(CONFIG_ACPI) += acpi.o
#obj-$(CONFIG_ACPI_SLEEP) += acpi_wakeup.o obj-$(CONFIG_ACPI_SLEEP) += wakeup.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-$(CONFIG_GART_IOMMU) += pci-gart.o aperture.o obj-$(CONFIG_GART_IOMMU) += pci-gart.o aperture.o
obj-$(CONFIG_DUMMY_IOMMU) += pci-nommu.o obj-$(CONFIG_DUMMY_IOMMU) += pci-nommu.o
......
...@@ -44,6 +44,9 @@ ...@@ -44,6 +44,9 @@
#include <asm/pgalloc.h> #include <asm/pgalloc.h>
#include <asm/io_apic.h> #include <asm/io_apic.h>
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/desc.h>
#include <asm/system.h>
#include <asm/segment.h>
extern int acpi_disabled; extern int acpi_disabled;
...@@ -70,7 +73,6 @@ __acpi_map_table ( ...@@ -70,7 +73,6 @@ __acpi_map_table (
if (phys_addr < (end_pfn_map << PAGE_SHIFT)) if (phys_addr < (end_pfn_map << PAGE_SHIFT))
return __va(phys_addr); return __va(phys_addr);
printk("acpi mapping beyond end_pfn: %lx > %lx\n", phys_addr, end_pfn<<PAGE_SHIFT);
return NULL; return NULL;
} }
...@@ -292,8 +294,7 @@ acpi_find_rsdp (void) ...@@ -292,8 +294,7 @@ acpi_find_rsdp (void)
int __init int __init
acpi_boot_init ( acpi_boot_init (void)
char *cmdline)
{ {
int result = 0; int result = 0;
...@@ -306,7 +307,7 @@ acpi_boot_init ( ...@@ -306,7 +307,7 @@ acpi_boot_init (
/* /*
* Initialize the ACPI boot-time table parser. * Initialize the ACPI boot-time table parser.
*/ */
result = acpi_table_init(cmdline); result = acpi_table_init();
if (result) if (result)
return result; return result;
...@@ -441,95 +442,41 @@ acpi_boot_init ( ...@@ -441,95 +442,41 @@ acpi_boot_init (
#ifdef CONFIG_ACPI_SLEEP #ifdef CONFIG_ACPI_SLEEP
#error not ported to x86-64 yet extern void acpi_prepare_wakeup(void);
extern unsigned char acpi_wakeup[], acpi_wakeup_end[], s3_prot16[];
#ifdef DEBUG
#include <linux/serial.h>
#endif
/* address in low memory of the wakeup routine. */ /* address in low memory of the wakeup routine. */
unsigned long acpi_wakeup_address = 0; unsigned long acpi_wakeup_address;
/* new page directory that we will be using */
static pmd_t *pmd;
/* saved page directory */
static pmd_t saved_pmd;
/* page which we'll use for the new page directory */
static pte_t *ptep;
extern unsigned long FASTCALL(acpi_copy_wakeup_routine(unsigned long));
/*
* acpi_create_identity_pmd
*
* Create a new, identity mapped pmd.
*
* Do this by creating new page directory, and marking all the pages as R/W
* Then set it as the new Page Middle Directory.
* And, of course, flush the TLB so it takes effect.
*
* We save the address of the old one, for later restoration.
*/
static void acpi_create_identity_pmd (void)
{
pgd_t *pgd;
int i;
ptep = (pte_t*)__get_free_page(GFP_KERNEL);
/* fill page with low mapping */
for (i = 0; i < PTRS_PER_PTE; i++)
set_pte(ptep + i, mk_pte_phys(i << PAGE_SHIFT, PAGE_SHARED));
pgd = pgd_offset(current->active_mm, 0);
pmd = pmd_alloc(current->mm,pgd, 0);
/* save the old pmd */
saved_pmd = *pmd;
/* set the new one */
set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(ptep)));
/* flush the TLB */
local_flush_tlb();
}
/*
* acpi_restore_pmd
*
* Restore the old pmd saved by acpi_create_identity_pmd and
* free the page that said function alloc'd
*/
static void acpi_restore_pmd (void)
{
set_pmd(pmd, saved_pmd);
local_flush_tlb();
free_page((unsigned long)ptep);
}
/** /**
* acpi_save_state_mem - save kernel state * acpi_save_state_mem - save kernel state
*
* Create an identity mapped page table and copy the wakeup routine to
* low memory.
*/ */
int acpi_save_state_mem (void) int acpi_save_state_mem (void)
{ {
acpi_create_identity_pmd(); if (!acpi_wakeup_address)
acpi_copy_wakeup_routine(acpi_wakeup_address); return -1;
memcpy((void*)acpi_wakeup_address, acpi_wakeup, acpi_wakeup_end - acpi_wakeup);
return 0; return 0;
} }
/** /**
* acpi_save_state_disk - save kernel state to disk * acpi_save_state_disk - save kernel state to disk
* *
* Assume preemption/interrupts are already turned off and that we're running
* on the BP (note this doesn't imply SMP is handled correctly)
*/ */
int acpi_save_state_disk (void) int acpi_save_state_disk (void)
{ {
unsigned long pbase = read_cr3() & PAGE_MASK;
if (pbase >= 0xffffffffUL) {
printk(KERN_ERR "ACPI: High page table. Suspend disabled.\n");
return 1; return 1;
}
set_seg_base(smp_processor_id(), GDT_ENTRY_KERNELCS16, s3_prot16);
swap_low_mappings();
acpi_prepare_wakeup();
return 0;
} }
/* /*
...@@ -537,13 +484,13 @@ int acpi_save_state_disk (void) ...@@ -537,13 +484,13 @@ int acpi_save_state_disk (void)
*/ */
void acpi_restore_state_mem (void) void acpi_restore_state_mem (void)
{ {
acpi_restore_pmd(); swap_low_mappings();
} }
/** /**
* acpi_reserve_bootmem - do _very_ early ACPI initialisation * acpi_reserve_bootmem - do _very_ early ACPI initialisation
* *
* We allocate a page in low memory for the wakeup * We allocate a page in 1MB low memory for the real-mode wakeup
* routine for when we come back from a sleep state. The * routine for when we come back from a sleep state. The
* runtime allocator allows specification of <16M pages, but not * runtime allocator allows specification of <16M pages, but not
* <1M pages. * <1M pages.
...@@ -551,7 +498,10 @@ void acpi_restore_state_mem (void) ...@@ -551,7 +498,10 @@ void acpi_restore_state_mem (void)
void __init acpi_reserve_bootmem(void) void __init acpi_reserve_bootmem(void)
{ {
acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE); acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE);
printk(KERN_DEBUG "ACPI: have wakeup address 0x%8.8lx\n", acpi_wakeup_address); if (!acpi_wakeup_address) {
printk(KERN_ERR "ACPI: Cannot allocate lowmem. S3 disabled.\n");
return;
}
} }
#endif /*CONFIG_ACPI_SLEEP*/ #endif /*CONFIG_ACPI_SLEEP*/
......
...@@ -57,7 +57,7 @@ static u32 __init allocate_aperture(void) ...@@ -57,7 +57,7 @@ static u32 __init allocate_aperture(void)
printk("Cannot allocate aperture memory hole (%p,%uK)\n", printk("Cannot allocate aperture memory hole (%p,%uK)\n",
p, aper_size>>10); p, aper_size>>10);
if (p) if (p)
free_bootmem((unsigned long)p, aper_size); free_bootmem_node(nd0, (unsigned long)p, aper_size);
return 0; return 0;
} }
printk("Mapping aperture over %d KB of RAM @ %lx\n", printk("Mapping aperture over %d KB of RAM @ %lx\n",
......
...@@ -1026,7 +1026,7 @@ asmlinkage void smp_error_interrupt(void) ...@@ -1026,7 +1026,7 @@ asmlinkage void smp_error_interrupt(void)
irq_exit(); irq_exit();
} }
int disable_apic __initdata; int disable_apic;
/* /*
* This initializes the IO-APIC and APIC hardware if this is * This initializes the IO-APIC and APIC hardware if this is
...@@ -1038,8 +1038,10 @@ int __init APIC_init_uniprocessor (void) ...@@ -1038,8 +1038,10 @@ int __init APIC_init_uniprocessor (void)
printk(KERN_INFO "Apic disabled\n"); printk(KERN_INFO "Apic disabled\n");
return -1; return -1;
} }
if (!smp_found_config && !cpu_has_apic) if (!smp_found_config && !cpu_has_apic) {
disable_apic = 1;
return -1; return -1;
}
/* /*
* Complain if the BIOS pretends there is one. * Complain if the BIOS pretends there is one.
...@@ -1047,6 +1049,7 @@ int __init APIC_init_uniprocessor (void) ...@@ -1047,6 +1049,7 @@ int __init APIC_init_uniprocessor (void)
if (!cpu_has_apic && APIC_INTEGRATED(apic_version[boot_cpu_id])) { if (!cpu_has_apic && APIC_INTEGRATED(apic_version[boot_cpu_id])) {
printk(KERN_ERR "BIOS bug, local APIC #%d not detected!...\n", printk(KERN_ERR "BIOS bug, local APIC #%d not detected!...\n",
boot_cpu_id); boot_cpu_id);
disable_apic = 1;
return -1; return -1;
} }
......
/* /*
* arch/x86_64/kernel/bluesmoke.c - x86-64 Machine Check Exception Reporting * arch/x86_64/kernel/bluesmoke.c - x86-64 Machine Check Exception Reporting
*
RED-PEN: need to add power management to restore after S3 wakeup.
*/ */
#include <linux/init.h> #include <linux/init.h>
......
...@@ -19,13 +19,17 @@ ...@@ -19,13 +19,17 @@
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/bootsetup.h> #include <asm/bootsetup.h>
extern unsigned long table_start, table_end;
extern char _end[]; extern char _end[];
/*
* PFN of last memory page.
*/
unsigned long end_pfn;
/* /*
* end_pfn only includes RAM, while end_pfn_map includes all e820 entries. * end_pfn only includes RAM, while end_pfn_map includes all e820 entries.
* The direct mapping extends to end_pfn_map, so that we can directly access * The direct mapping extends to end_pfn_map, so that we can directly access
* ACPI and other tables without having to play with fixmaps. * apertures, ACPI and other tables without having to play with fixmaps.
*/ */
unsigned long end_pfn_map; unsigned long end_pfn_map;
...@@ -42,18 +46,16 @@ static inline int bad_addr(unsigned long *addrp, unsigned long size) ...@@ -42,18 +46,16 @@ static inline int bad_addr(unsigned long *addrp, unsigned long size)
unsigned long addr = *addrp, last = addr + size; unsigned long addr = *addrp, last = addr + size;
/* various gunk below that needed for SMP startup */ /* various gunk below that needed for SMP startup */
if (addr < 7*PAGE_SIZE) { if (addr < 0x8000) {
*addrp = 7*PAGE_SIZE; *addrp = 0x8000;
return 1; return 1;
} }
#if 0
/* direct mapping tables of the kernel */ /* direct mapping tables of the kernel */
if (last >= table_start<<PAGE_SHIFT && addr < table_end<<PAGE_SHIFT) { if (last >= table_start<<PAGE_SHIFT && addr < table_end<<PAGE_SHIFT) {
*addrp = table_end << PAGE_SHIFT; *addrp = table_end << PAGE_SHIFT;
return 1; return 1;
} }
#endif
/* initrd */ /* initrd */
#ifdef CONFIG_BLK_DEV_INITRD #ifdef CONFIG_BLK_DEV_INITRD
...@@ -145,10 +147,10 @@ void __init e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned lon ...@@ -145,10 +147,10 @@ void __init e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned lon
/* /*
* Find the highest page frame number we have available * Find the highest page frame number we have available
*/ */
void __init e820_end_of_ram(void) unsigned long __init e820_end_of_ram(void)
{ {
int i; int i;
end_pfn = 0; unsigned long end_pfn = 0;
for (i = 0; i < e820.nr_map; i++) { for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i]; struct e820entry *ei = &e820.map[i];
...@@ -175,6 +177,8 @@ void __init e820_end_of_ram(void) ...@@ -175,6 +177,8 @@ void __init e820_end_of_ram(void)
end_pfn = end_user_pfn; end_pfn = end_user_pfn;
if (end_pfn > end_pfn_map) if (end_pfn > end_pfn_map)
end_pfn = end_pfn_map; end_pfn = end_pfn_map;
return end_pfn;
} }
/* /*
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/string.h> #include <linux/string.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/processor.h>
/* Simple VGA output */ /* Simple VGA output */
...@@ -104,9 +105,9 @@ static __init void early_serial_init(char *opt) ...@@ -104,9 +105,9 @@ static __init void early_serial_init(char *opt)
s = strsep(&opt, ","); s = strsep(&opt, ",");
if (s != NULL) { if (s != NULL) {
unsigned port; unsigned port;
if (!strncmp(s,"0x",2)) if (!strncmp(s,"0x",2)) {
early_serial_base = simple_strtoul(s, &e, 16); early_serial_base = simple_strtoul(s, &e, 16);
else { } else {
static int bases[] = { 0x3f8, 0x2f8 }; static int bases[] = { 0x3f8, 0x2f8 };
if (!strncmp(s,"ttyS",4)) if (!strncmp(s,"ttyS",4))
s+=4; s+=4;
......
...@@ -512,8 +512,7 @@ ENTRY(spurious_interrupt) ...@@ -512,8 +512,7 @@ ENTRY(spurious_interrupt)
* Exception entry point. This expects an error code/orig_rax on the stack * Exception entry point. This expects an error code/orig_rax on the stack
* and the exception handler in %rax. * and the exception handler in %rax.
*/ */
ALIGN ENTRY(error_entry)
error_entry:
/* rdi slot contains rax, oldrax contains error code */ /* rdi slot contains rax, oldrax contains error code */
pushq %rsi pushq %rsi
movq 8(%rsp),%rsi /* load rax */ movq 8(%rsp),%rsi /* load rax */
...@@ -532,10 +531,7 @@ error_swapgs: ...@@ -532,10 +531,7 @@ error_swapgs:
xorl %ebx,%ebx xorl %ebx,%ebx
swapgs swapgs
error_sti: error_sti:
bt $9,EFLAGS(%rsp) movq %rdi,RDI(%rsp)
jnc 1f
sti
1: movq %rdi,RDI(%rsp)
movq %rsp,%rdi movq %rsp,%rdi
movq ORIG_RAX(%rsp),%rsi /* get error code */ movq ORIG_RAX(%rsp),%rsi /* get error code */
movq $-1,ORIG_RAX(%rsp) movq $-1,ORIG_RAX(%rsp)
...@@ -573,7 +569,8 @@ ENTRY(load_gs_index) ...@@ -573,7 +569,8 @@ ENTRY(load_gs_index)
swapgs swapgs
gs_change: gs_change:
movl %edi,%gs movl %edi,%gs
2: swapgs 2: sfence /* workaround */
swapgs
popf popf
ret ret
......
...@@ -72,8 +72,7 @@ startup_32: ...@@ -72,8 +72,7 @@ startup_32:
/* Setup EFER (Extended Feature Enable Register) */ /* Setup EFER (Extended Feature Enable Register) */
movl $MSR_EFER, %ecx movl $MSR_EFER, %ecx
rdmsr rdmsr
/* Fool rdmsr and reset %eax to avoid dependences */
xorl %eax, %eax
/* Enable Long Mode */ /* Enable Long Mode */
btsl $_EFER_LME, %eax btsl $_EFER_LME, %eax
/* Enable System Call */ /* Enable System Call */
...@@ -112,7 +111,6 @@ reach_compatibility_mode: ...@@ -112,7 +111,6 @@ reach_compatibility_mode:
jnz second jnz second
/* Load new GDT with the 64bit segment using 32bit descriptor */ /* Load new GDT with the 64bit segment using 32bit descriptor */
/* to avoid 32bit relocations we use fixed adresses here */
movl $(pGDT32 - __START_KERNEL_map), %eax movl $(pGDT32 - __START_KERNEL_map), %eax
lgdt (%eax) lgdt (%eax)
...@@ -349,17 +347,14 @@ ENTRY(cpu_gdt_table) ...@@ -349,17 +347,14 @@ ENTRY(cpu_gdt_table)
.quad 0x00cffe000000ffff /* __USER32_CS */ .quad 0x00cffe000000ffff /* __USER32_CS */
.quad 0x00cff2000000ffff /* __USER_DS, __USER32_DS */ .quad 0x00cff2000000ffff /* __USER_DS, __USER32_DS */
.quad 0x00affa000000ffff /* __USER_CS */ .quad 0x00affa000000ffff /* __USER_CS */
.word 0xFFFF # 4Gb - (0x100000*0x1000 = 4Gb) .quad 0x00cf9a000000ffff /* __KERNEL32_CS */
.word 0 # base address = 0
.word 0x9A00 # code read/exec
.word 0x00CF # granularity = 4096, 386
# (+5th nibble of limit)
/* __KERNEL32_CS */
.quad 0,0 /* TSS */ .quad 0,0 /* TSS */
.quad 0 /* LDT */ .quad 0 /* LDT */
.quad 0,0,0 /* three TLS descriptors */ .quad 0,0,0 /* three TLS descriptors */
.quad 0x00cff2000000ffff /* dummy descriptor for long base */ .quad 0 /* unused now */
.quad 0 /* pad to cache line boundary */ .quad 0x00009a000000ffff /* __KERNEL16_CS - 16bit PM for S3 wakeup. */
/* base must be patched for real base address. */
/* This should be a multiple of the cache line size */
gdt_end: gdt_end:
.globl gdt_end .globl gdt_end
......
...@@ -13,6 +13,8 @@ ...@@ -13,6 +13,8 @@
#include <linux/string.h> #include <linux/string.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/proto.h>
#include <asm/smp.h>
/* Don't add a printk in there. printk relies on the PDA which is not initialized /* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */ yet. */
...@@ -70,9 +72,6 @@ static void __init setup_boot_cpu_data(void) ...@@ -70,9 +72,6 @@ static void __init setup_boot_cpu_data(void)
boot_cpu_data.x86_mask = eax & 0xf; boot_cpu_data.x86_mask = eax & 0xf;
} }
extern void start_kernel(void), pda_init(int), setup_early_printk(char *);
extern int disable_apic;
void __init x86_64_start_kernel(char * real_mode_data) void __init x86_64_start_kernel(char * real_mode_data)
{ {
char *s; char *s;
...@@ -83,6 +82,11 @@ void __init x86_64_start_kernel(char * real_mode_data) ...@@ -83,6 +82,11 @@ void __init x86_64_start_kernel(char * real_mode_data)
s = strstr(saved_command_line, "earlyprintk="); s = strstr(saved_command_line, "earlyprintk=");
if (s != NULL) if (s != NULL)
setup_early_printk(s+12); setup_early_printk(s+12);
#ifdef CONFIG_DISCONTIGMEM
s = strstr(saved_command_line, "numa=");
if (s != NULL)
numa_setup(s+5);
#endif
#ifdef CONFIG_X86_IO_APIC #ifdef CONFIG_X86_IO_APIC
if (strstr(saved_command_line, "disableapic")) if (strstr(saved_command_line, "disableapic"))
disable_apic = 1; disable_apic = 1;
......
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
static struct fs_struct init_fs = INIT_FS; static struct fs_struct init_fs = INIT_FS;
static struct files_struct init_files = INIT_FILES; static struct files_struct init_files = INIT_FILES;
static struct signal_struct init_signals = INIT_SIGNALS(init_signals); static struct signal_struct init_signals = INIT_SIGNALS(init_signals);
static struct sighand_struct init_sighand = INIT_SIGHAND(init_sighand);
struct mm_struct init_mm = INIT_MM(init_mm); struct mm_struct init_mm = INIT_MM(init_mm);
/* /*
......
...@@ -137,7 +137,8 @@ int show_interrupts(struct seq_file *p, void *v) ...@@ -137,7 +137,8 @@ int show_interrupts(struct seq_file *p, void *v)
struct irqaction * action; struct irqaction * action;
seq_printf(p, " "); seq_printf(p, " ");
for_each_cpu(j) for (j=0; j<NR_CPUS; j++)
if (cpu_online(j))
seq_printf(p, "CPU%d ",j); seq_printf(p, "CPU%d ",j);
seq_putc(p, '\n'); seq_putc(p, '\n');
...@@ -149,7 +150,8 @@ int show_interrupts(struct seq_file *p, void *v) ...@@ -149,7 +150,8 @@ int show_interrupts(struct seq_file *p, void *v)
#ifndef CONFIG_SMP #ifndef CONFIG_SMP
seq_printf(p, "%10u ", kstat_irqs(i)); seq_printf(p, "%10u ", kstat_irqs(i));
#else #else
for_each_cpu(j) for (j=0; j<NR_CPUS; j++)
if (cpu_online(j))
seq_printf(p, "%10u ", seq_printf(p, "%10u ",
kstat_cpu(j).irqs[i]); kstat_cpu(j).irqs[i]);
#endif #endif
...@@ -161,12 +163,14 @@ int show_interrupts(struct seq_file *p, void *v) ...@@ -161,12 +163,14 @@ int show_interrupts(struct seq_file *p, void *v)
seq_putc(p, '\n'); seq_putc(p, '\n');
} }
seq_printf(p, "NMI: "); seq_printf(p, "NMI: ");
for_each_cpu(j) for (j = 0; j < NR_CPUS; j++)
if (cpu_online(j))
seq_printf(p, "%10u ", cpu_pda[j].__nmi_count); seq_printf(p, "%10u ", cpu_pda[j].__nmi_count);
seq_putc(p, '\n'); seq_putc(p, '\n');
#if CONFIG_X86_LOCAL_APIC #if CONFIG_X86_LOCAL_APIC
seq_printf(p, "LOC: "); seq_printf(p, "LOC: ");
for_each_cpu(j) for (j = 0; j < NR_CPUS; j++)
if (cpu_online(j))
seq_printf(p, "%10u ", cpu_pda[j].apic_timer_irqs); seq_printf(p, "%10u ", cpu_pda[j].apic_timer_irqs);
seq_putc(p, '\n'); seq_putc(p, '\n');
#endif #endif
......
...@@ -31,6 +31,11 @@ ...@@ -31,6 +31,11 @@
#define DEBUGP(fmt...) #define DEBUGP(fmt...)
/* TODO this should be in vmlist, but we must fix get_vm_area first to
handle out of bounds entries properly.
Also need to fix /proc/kcore, /dev/kmem */
static struct vm_struct *mod_vmlist;
void module_free(struct module *mod, void *module_region) void module_free(struct module *mod, void *module_region)
{ {
struct vm_struct **prevp, *map; struct vm_struct **prevp, *map;
...@@ -40,7 +45,7 @@ void module_free(struct module *mod, void *module_region) ...@@ -40,7 +45,7 @@ void module_free(struct module *mod, void *module_region)
if (!addr) if (!addr)
return; return;
write_lock(&vmlist_lock); write_lock(&vmlist_lock);
for (prevp = &vmlist ; (map = *prevp) ; prevp = &map->next) { for (prevp = &mod_vmlist ; (map = *prevp) ; prevp = &map->next) {
if ((unsigned long)map->addr == addr) { if ((unsigned long)map->addr == addr) {
*prevp = map->next; *prevp = map->next;
write_unlock(&vmlist_lock); write_unlock(&vmlist_lock);
...@@ -81,7 +86,7 @@ void *module_alloc(unsigned long size) ...@@ -81,7 +86,7 @@ void *module_alloc(unsigned long size)
write_lock(&vmlist_lock); write_lock(&vmlist_lock);
addr = (void *) MODULES_VADDR; addr = (void *) MODULES_VADDR;
for (p = &vmlist; (tmp = *p); p = &tmp->next) { for (p = &mod_vmlist; (tmp = *p); p = &tmp->next) {
void *next; void *next;
DEBUGP("vmlist %p %lu addr %p\n", tmp->addr, tmp->size, addr); DEBUGP("vmlist %p %lu addr %p\n", tmp->addr, tmp->size, addr);
if (size + (unsigned long) addr + PAGE_SIZE < (unsigned long) tmp->addr) if (size + (unsigned long) addr + PAGE_SIZE < (unsigned long) tmp->addr)
......
...@@ -29,6 +29,7 @@ ...@@ -29,6 +29,7 @@
#include <asm/mpspec.h> #include <asm/mpspec.h>
#include <asm/pgalloc.h> #include <asm/pgalloc.h>
#include <asm/io_apic.h> #include <asm/io_apic.h>
#include <asm/proto.h>
/* Have we found an MP table */ /* Have we found an MP table */
int smp_found_config; int smp_found_config;
...@@ -83,7 +84,6 @@ extern int acpi_parse_ioapic (acpi_table_entry_header *header); ...@@ -83,7 +84,6 @@ extern int acpi_parse_ioapic (acpi_table_entry_header *header);
* Intel MP BIOS table parsing routines: * Intel MP BIOS table parsing routines:
*/ */
#ifndef CONFIG_X86_VISWS_APIC
/* /*
* Checksum an MP configuration block. * Checksum an MP configuration block.
*/ */
...@@ -582,9 +582,9 @@ static int __init smp_scan_config (unsigned long base, unsigned long length) ...@@ -582,9 +582,9 @@ static int __init smp_scan_config (unsigned long base, unsigned long length)
smp_found_config = 1; smp_found_config = 1;
printk("found SMP MP-table at %08lx\n", printk("found SMP MP-table at %08lx\n",
virt_to_phys(mpf)); virt_to_phys(mpf));
reserve_bootmem(virt_to_phys(mpf), PAGE_SIZE); reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE);
if (mpf->mpf_physptr) if (mpf->mpf_physptr)
reserve_bootmem(mpf->mpf_physptr, PAGE_SIZE); reserve_bootmem_generic(mpf->mpf_physptr, PAGE_SIZE);
mpf_found = mpf; mpf_found = mpf;
return 1; return 1;
} }
...@@ -632,38 +632,14 @@ void __init find_intel_smp (void) ...@@ -632,38 +632,14 @@ void __init find_intel_smp (void)
printk(KERN_WARNING "WARNING: MP table in the EBDA can be UNSAFE, contact linux-smp@vger.kernel.org if you experience SMP problems!\n"); printk(KERN_WARNING "WARNING: MP table in the EBDA can be UNSAFE, contact linux-smp@vger.kernel.org if you experience SMP problems!\n");
} }
#else
/*
* The Visual Workstation is Intel MP compliant in the hardware
* sense, but it doesnt have a BIOS(-configuration table).
* No problem for Linux.
*/
void __init find_visws_smp(void)
{
smp_found_config = 1;
phys_cpu_present_map |= 2; /* or in id 1 */
apic_version[1] |= 0x10; /* integrated APIC */
apic_version[0] |= 0x10;
mp_lapic_addr = APIC_DEFAULT_PHYS_BASE;
}
#endif
/* /*
* - Intel MP Configuration Table * - Intel MP Configuration Table
* - or SGI Visual Workstation configuration
*/ */
void __init find_smp_config (void) void __init find_smp_config (void)
{ {
#ifdef CONFIG_X86_LOCAL_APIC #ifdef CONFIG_X86_LOCAL_APIC
find_intel_smp(); find_intel_smp();
#endif #endif
#ifdef CONFIG_VISWS
find_visws_smp();
#endif
} }
......
...@@ -22,6 +22,9 @@ ...@@ -22,6 +22,9 @@
* *
* This driver uses /dev/cpu/%d/msr where %d is the minor number, and on * This driver uses /dev/cpu/%d/msr where %d is the minor number, and on
* an SMP box will direct the access to CPU %d. * an SMP box will direct the access to CPU %d.
RED-PEN: need to get power management for S3 restore
*/ */
#include <linux/module.h> #include <linux/module.h>
......
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include <asm/mtrr.h> #include <asm/mtrr.h>
#include <asm/mpspec.h> #include <asm/mpspec.h>
#include <asm/nmi.h> #include <asm/nmi.h>
#include <asm/msr.h>
extern void default_do_nmi(struct pt_regs *); extern void default_do_nmi(struct pt_regs *);
...@@ -71,13 +72,14 @@ int __init check_nmi_watchdog (void) ...@@ -71,13 +72,14 @@ int __init check_nmi_watchdog (void)
printk(KERN_INFO "testing NMI watchdog ... "); printk(KERN_INFO "testing NMI watchdog ... ");
for_each_cpu(cpu) { for (cpu = 0; cpu < NR_CPUS; cpu++)
counts[cpu] = cpu_pda[cpu].__nmi_count; counts[cpu] = cpu_pda[cpu].__nmi_count;
}
local_irq_enable(); local_irq_enable();
mdelay((10*1000)/nmi_hz); // wait 10 ticks mdelay((10*1000)/nmi_hz); // wait 10 ticks
for_each_cpu(cpu) { for (cpu = 0; cpu < NR_CPUS; cpu++) {
if (!cpu_online(cpu))
continue;
if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) { if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d)!\n", printk("CPU#%d: NMI appears to be stuck (%d)!\n",
cpu, cpu,
...@@ -173,7 +175,7 @@ static inline void nmi_pm_init(void) { } ...@@ -173,7 +175,7 @@ static inline void nmi_pm_init(void) { }
* Original code written by Keith Owens. * Original code written by Keith Owens.
*/ */
static void __pminit setup_k7_watchdog(void) static void setup_k7_watchdog(void)
{ {
int i; int i;
unsigned int evntsel; unsigned int evntsel;
...@@ -183,8 +185,10 @@ static void __pminit setup_k7_watchdog(void) ...@@ -183,8 +185,10 @@ static void __pminit setup_k7_watchdog(void)
nmi_perfctr_msr = MSR_K7_PERFCTR0; nmi_perfctr_msr = MSR_K7_PERFCTR0;
for(i = 0; i < 4; ++i) { for(i = 0; i < 4; ++i) {
wrmsr(MSR_K7_EVNTSEL0+i, 0, 0); /* Simulator may not support it */
wrmsr(MSR_K7_PERFCTR0+i, 0, 0); if (checking_wrmsrl(MSR_K7_EVNTSEL0+i, 0UL))
return;
wrmsrl(MSR_K7_PERFCTR0+i, 0UL);
} }
evntsel = K7_EVNTSEL_INT evntsel = K7_EVNTSEL_INT
...@@ -200,16 +204,12 @@ static void __pminit setup_k7_watchdog(void) ...@@ -200,16 +204,12 @@ static void __pminit setup_k7_watchdog(void)
wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
} }
void __pminit setup_apic_nmi_watchdog (void) void setup_apic_nmi_watchdog (void)
{ {
switch (boot_cpu_data.x86_vendor) { switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD: case X86_VENDOR_AMD:
if (boot_cpu_data.x86 < 6) if (boot_cpu_data.x86 < 6)
return; return;
/* Simics masquerades as AMD, but does not support
performance counters */
if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
return;
setup_k7_watchdog(); setup_k7_watchdog();
break; break;
default: default:
......
...@@ -366,12 +366,15 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p) ...@@ -366,12 +366,15 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
also reload when it has changed. also reload when it has changed.
when prev process used 64bit base always reload when prev process used 64bit base always reload
to avoid an information leak. */ to avoid an information leak. */
if (unlikely((fsindex | next->fsindex) || prev->fs)) if (unlikely(fsindex | next->fsindex | prev->fs)) {
loadsegment(fs, next->fsindex); loadsegment(fs, next->fsindex);
/* check if the user changed the selector /* check if the user used a selector != 0
if yes clear 64bit base. */ * if yes clear 64bit base, since overloaded base
if (unlikely(fsindex != prev->fsindex)) * is always mapped to the Null selector
*/
if (fsindex)
prev->fs = 0; prev->fs = 0;
}
/* when next process has a 64bit base use it */ /* when next process has a 64bit base use it */
if (next->fs) if (next->fs)
wrmsrl(MSR_FS_BASE, next->fs); wrmsrl(MSR_FS_BASE, next->fs);
...@@ -380,10 +383,11 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p) ...@@ -380,10 +383,11 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{ {
unsigned gsindex; unsigned gsindex;
asm volatile("movl %%gs,%0" : "=g" (gsindex)); asm volatile("movl %%gs,%0" : "=g" (gsindex));
if (unlikely((gsindex | next->gsindex) || prev->gs)) if (unlikely(gsindex | next->gsindex | prev->gs)) {
load_gs_index(next->gsindex); load_gs_index(next->gsindex);
if (unlikely(gsindex != prev->gsindex)) if (gsindex)
prev->gs = 0; prev->gs = 0;
}
if (next->gs) if (next->gs)
wrmsrl(MSR_KERNEL_GS_BASE, next->gs); wrmsrl(MSR_KERNEL_GS_BASE, next->gs);
prev->gsindex = gsindex; prev->gsindex = gsindex;
...@@ -537,29 +541,23 @@ int sys_arch_prctl(int code, unsigned long addr) ...@@ -537,29 +541,23 @@ int sys_arch_prctl(int code, unsigned long addr)
switch (code) { switch (code) {
case ARCH_SET_GS: case ARCH_SET_GS:
#if 1
/* For now. We still have one unsolved bug in long gs base context
switch handling. */
return -EINVAL;
#else
if (addr >= TASK_SIZE) if (addr >= TASK_SIZE)
return -EPERM; return -EPERM;
get_cpu(); get_cpu();
load_gs_index(__USER_LONGBASE); load_gs_index(0);
current->thread.gsindex = __USER_LONGBASE; current->thread.gsindex = 0;
current->thread.gs = addr; current->thread.gs = addr;
ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr); ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
put_cpu(); put_cpu();
break; break;
#endif
case ARCH_SET_FS: case ARCH_SET_FS:
/* Not strictly needed for fs, but do it for symmetry /* Not strictly needed for fs, but do it for symmetry
with gs */ with gs */
if (addr >= TASK_SIZE) if (addr >= TASK_SIZE)
return -EPERM; return -EPERM;
get_cpu(); get_cpu();
asm volatile("movl %0,%%fs" :: "r" (__USER_LONGBASE)); asm volatile("movl %0,%%fs" :: "r" (0));
current->thread.fsindex = __USER_LONGBASE; current->thread.fsindex = 0;
current->thread.fs = addr; current->thread.fs = addr;
ret = checking_wrmsrl(MSR_FS_BASE, addr); ret = checking_wrmsrl(MSR_FS_BASE, addr);
put_cpu(); put_cpu();
......
...@@ -38,6 +38,7 @@ ...@@ -38,6 +38,7 @@
#include <linux/seq_file.h> #include <linux/seq_file.h>
#include <linux/root_dev.h> #include <linux/root_dev.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/acpi.h>
#include <asm/mtrr.h> #include <asm/mtrr.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/system.h> #include <asm/system.h>
...@@ -53,6 +54,8 @@ ...@@ -53,6 +54,8 @@
#include <asm/smp.h> #include <asm/smp.h>
#include <asm/proto.h> #include <asm/proto.h>
#define Dprintk(x...) printk(x)
/* /*
* Machine setup.. * Machine setup..
*/ */
...@@ -236,6 +239,8 @@ void __init setup_arch(char **cmdline_p) ...@@ -236,6 +239,8 @@ void __init setup_arch(char **cmdline_p)
{ {
int i; int i;
Dprintk("setup_arch\n");
ROOT_DEV = ORIG_ROOT_DEV; ROOT_DEV = ORIG_ROOT_DEV;
drive_info = DRIVE_INFO; drive_info = DRIVE_INFO;
screen_info = SCREEN_INFO; screen_info = SCREEN_INFO;
...@@ -263,34 +268,34 @@ void __init setup_arch(char **cmdline_p) ...@@ -263,34 +268,34 @@ void __init setup_arch(char **cmdline_p)
parse_cmdline_early(cmdline_p); parse_cmdline_early(cmdline_p);
#define PFN_UP(x) (((x) + PAGE_SIZE-1) >> PAGE_SHIFT)
#define PFN_DOWN(x) ((x) >> PAGE_SHIFT)
#define PFN_PHYS(x) ((x) << PAGE_SHIFT)
#define MAXMEM (120UL * 1024 * 1024 * 1024 * 1024) /* 120TB */
#define MAXMEM_PFN PFN_DOWN(MAXMEM)
#define MAX_NONPAE_PFN (1 << 20)
/* /*
* partially used pages are not usable - thus * partially used pages are not usable - thus
* we are rounding upwards: * we are rounding upwards:
*/ */
start_pfn = PFN_UP(__pa_symbol(&_end)); end_pfn = e820_end_of_ram();
e820_end_of_ram();
init_memory_mapping(); init_memory_mapping();
#ifdef CONFIG_DISCONTIGMEM
numa_initmem_init(0, end_pfn);
#else
contig_initmem_init(); contig_initmem_init();
#endif
/* Reserve direct mapping */
reserve_bootmem_generic(table_start << PAGE_SHIFT,
(table_end - table_start) << PAGE_SHIFT);
/* reserve kernel */ /* reserve kernel */
reserve_bootmem(HIGH_MEMORY, PFN_PHYS(start_pfn) - HIGH_MEMORY); unsigned long kernel_end;
kernel_end = round_up(__pa_symbol(&_end),PAGE_SIZE);
reserve_bootmem_generic(HIGH_MEMORY, kernel_end - HIGH_MEMORY);
/* /*
* reserve physical page 0 - it's a special BIOS page on many boxes, * reserve physical page 0 - it's a special BIOS page on many boxes,
* enabling clean reboots, SMP operation, laptop functions. * enabling clean reboots, SMP operation, laptop functions.
*/ */
reserve_bootmem(0, PAGE_SIZE); reserve_bootmem_generic(0, PAGE_SIZE);
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
/* /*
...@@ -298,8 +303,12 @@ void __init setup_arch(char **cmdline_p) ...@@ -298,8 +303,12 @@ void __init setup_arch(char **cmdline_p)
* FIXME: Don't need the extra page at 4K, but need to fix * FIXME: Don't need the extra page at 4K, but need to fix
* trampoline before removing it. (see the GDT stuff) * trampoline before removing it. (see the GDT stuff)
*/ */
reserve_bootmem(PAGE_SIZE, PAGE_SIZE); reserve_bootmem_generic(PAGE_SIZE, PAGE_SIZE);
/* Reserve SMP trampoline */
reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, PAGE_SIZE);
#endif #endif
#ifdef CONFIG_ACPI_SLEEP #ifdef CONFIG_ACPI_SLEEP
/* /*
* Reserve low memory region for sleep support. * Reserve low memory region for sleep support.
...@@ -315,7 +324,7 @@ void __init setup_arch(char **cmdline_p) ...@@ -315,7 +324,7 @@ void __init setup_arch(char **cmdline_p)
#ifdef CONFIG_BLK_DEV_INITRD #ifdef CONFIG_BLK_DEV_INITRD
if (LOADER_TYPE && INITRD_START) { if (LOADER_TYPE && INITRD_START) {
if (INITRD_START + INITRD_SIZE <= (end_pfn << PAGE_SHIFT)) { if (INITRD_START + INITRD_SIZE <= (end_pfn << PAGE_SHIFT)) {
reserve_bootmem(INITRD_START, INITRD_SIZE); reserve_bootmem_generic(INITRD_START, INITRD_SIZE);
initrd_start = initrd_start =
INITRD_START ? INITRD_START + PAGE_OFFSET : 0; INITRD_START ? INITRD_START + PAGE_OFFSET : 0;
initrd_end = initrd_start+INITRD_SIZE; initrd_end = initrd_start+INITRD_SIZE;
...@@ -330,14 +339,6 @@ void __init setup_arch(char **cmdline_p) ...@@ -330,14 +339,6 @@ void __init setup_arch(char **cmdline_p)
} }
#endif #endif
/*
* NOTE: before this point _nobody_ is allowed to allocate
* any memory using the bootmem allocator.
*/
#ifdef CONFIG_SMP
smp_alloc_memory(); /* AP processor realmode stacks in low memory*/
#endif
paging_init(); paging_init();
#ifdef CONFIG_ACPI_BOOT #ifdef CONFIG_ACPI_BOOT
/* /*
...@@ -347,7 +348,7 @@ void __init setup_arch(char **cmdline_p) ...@@ -347,7 +348,7 @@ void __init setup_arch(char **cmdline_p)
* of MADT). * of MADT).
*/ */
if (!acpi_disabled) if (!acpi_disabled)
acpi_boot_init(*cmdline_p); acpi_boot_init();
#endif #endif
#ifdef CONFIG_X86_LOCAL_APIC #ifdef CONFIG_X86_LOCAL_APIC
/* /*
......
/* /*
* X86-64 specific CPU setup. * X86-64 specific CPU setup.
* Copyright (C) 1995 Linus Torvalds * Copyright (C) 1995 Linus Torvalds
* Copyright 2001, 2002 SuSE Labs / Andi Kleen. * Copyright 2001, 2002, 2003 SuSE Labs / Andi Kleen.
* See setup.c for older changelog. * See setup.c for older changelog.
* $Id: setup64.c,v 1.12 2002/03/21 10:09:17 ak Exp $ * $Id: setup64.c,v 1.12 2002/03/21 10:09:17 ak Exp $
*/ */
...@@ -90,6 +90,17 @@ void pda_init(int cpu) ...@@ -90,6 +90,17 @@ void pda_init(int cpu)
pml4_t *level4; pml4_t *level4;
struct x8664_pda *pda = &cpu_pda[cpu]; struct x8664_pda *pda = &cpu_pda[cpu];
/* Setup up data that may be needed in __get_free_pages early */
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0));
wrmsrl(MSR_GS_BASE, cpu_pda + cpu);
pda->me = pda;
pda->cpunumber = cpu;
pda->irqcount = -1;
pda->cpudata_offset = 0;
pda->kernelstack =
(unsigned long)current_thread_info() - PDA_STACKOFFSET + THREAD_SIZE;
if (cpu == 0) { if (cpu == 0) {
/* others are initialized in smpboot.c */ /* others are initialized in smpboot.c */
pda->pcurrent = &init_task; pda->pcurrent = &init_task;
...@@ -112,18 +123,8 @@ void pda_init(int cpu) ...@@ -112,18 +123,8 @@ void pda_init(int cpu)
asm volatile("movq %0,%%cr3" :: "r" (__pa(level4))); asm volatile("movq %0,%%cr3" :: "r" (__pa(level4)));
pda->irqstackptr += IRQSTACKSIZE-64; pda->irqstackptr += IRQSTACKSIZE-64;
pda->cpunumber = cpu;
pda->irqcount = -1;
pda->kernelstack =
(unsigned long)stack_thread_info() - PDA_STACKOFFSET + THREAD_SIZE;
pda->me = pda;
pda->cpudata_offset = 0;
pda->active_mm = &init_mm; pda->active_mm = &init_mm;
pda->mmu_state = 0; pda->mmu_state = 0;
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0));
wrmsrl(MSR_GS_BASE, cpu_pda + cpu);
} }
#define EXCEPTION_STK_ORDER 0 /* >= N_EXCEPTION_STACKS*EXCEPTION_STKSZ */ #define EXCEPTION_STK_ORDER 0 /* >= N_EXCEPTION_STACKS*EXCEPTION_STKSZ */
...@@ -150,10 +151,10 @@ void __init cpu_init (void) ...@@ -150,10 +151,10 @@ void __init cpu_init (void)
/* CPU 0 is initialised in head64.c */ /* CPU 0 is initialised in head64.c */
if (cpu != 0) { if (cpu != 0) {
pda_init(cpu);
estacks = (char *)__get_free_pages(GFP_ATOMIC, 0); estacks = (char *)__get_free_pages(GFP_ATOMIC, 0);
if (!estacks) if (!estacks)
panic("Can't allocate exception stacks for CPU %d\n",cpu); panic("Can't allocate exception stacks for CPU %d\n",cpu);
pda_init(cpu);
} else } else
estacks = boot_exception_stacks; estacks = boot_exception_stacks;
......
...@@ -353,7 +353,7 @@ static void ...@@ -353,7 +353,7 @@ static void
handle_signal(unsigned long sig, siginfo_t *info, sigset_t *oldset, handle_signal(unsigned long sig, siginfo_t *info, sigset_t *oldset,
struct pt_regs * regs) struct pt_regs * regs)
{ {
struct k_sigaction *ka = &current->sig->action[sig-1]; struct k_sigaction *ka = &current->sighand->action[sig-1];
#if DEBUG_SIG #if DEBUG_SIG
printk("handle_signal pid:%d sig:%lu rip:%lx rsp:%lx regs=%p\n", current->pid, sig, printk("handle_signal pid:%d sig:%lu rip:%lx rsp:%lx regs=%p\n", current->pid, sig,
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
* *
* (c) 1995 Alan Cox, Building #3 <alan@redhat.com> * (c) 1995 Alan Cox, Building #3 <alan@redhat.com>
* (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com> * (c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
* (c) 2002,2003 Andi Kleen, SuSE Labs.
* *
* This code is released under the GNU General Public License version 2 or * This code is released under the GNU General Public License version 2 or
* later. * later.
...@@ -491,3 +492,24 @@ asmlinkage void smp_call_function_interrupt(void) ...@@ -491,3 +492,24 @@ asmlinkage void smp_call_function_interrupt(void)
} }
} }
/* Slow. Should be only used for debugging. */
int slow_smp_processor_id(void)
{
int stack_location;
unsigned long sp = (unsigned long)&stack_location;
int cpu;
unsigned long mask;
for_each_cpu(cpu, mask) {
if (sp >= (u64)cpu_pda[cpu].irqstackptr - IRQSTACKSIZE &&
sp <= (u64)cpu_pda[cpu].irqstackptr)
return cpu;
unsigned long estack = init_tss[cpu].ist[0] - EXCEPTION_STKSZ;
if (sp >= estack && sp <= estack+(1<<(PAGE_SHIFT+EXCEPTION_STK_ORDER)))
return cpu;
}
return stack_smp_processor_id();
}
...@@ -51,13 +51,10 @@ ...@@ -51,13 +51,10 @@
#include <asm/kdebug.h> #include <asm/kdebug.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
/* Bitmask of currently online CPUs */ extern int disable_apic;
unsigned long cpu_online_map;
/* which CPU (physical APIC ID) maps to which logical CPU number */ /* Bitmask of currently online CPUs */
volatile int x86_apicid_to_cpu[NR_CPUS]; unsigned long cpu_online_map = 1;
/* which logical CPU number maps to which CPU (physical APIC ID) */
volatile int x86_cpu_to_apicid[NR_CPUS];
static volatile unsigned long cpu_callin_map; static volatile unsigned long cpu_callin_map;
volatile unsigned long cpu_callout_map; volatile unsigned long cpu_callout_map;
...@@ -75,7 +72,6 @@ int smp_threads_ready; ...@@ -75,7 +72,6 @@ int smp_threads_ready;
extern unsigned char trampoline_data []; extern unsigned char trampoline_data [];
extern unsigned char trampoline_end []; extern unsigned char trampoline_end [];
static unsigned char *trampoline_base;
/* /*
* Currently trivial. Write the real->protected mode * Currently trivial. Write the real->protected mode
...@@ -85,25 +81,11 @@ static unsigned char *trampoline_base; ...@@ -85,25 +81,11 @@ static unsigned char *trampoline_base;
static unsigned long __init setup_trampoline(void) static unsigned long __init setup_trampoline(void)
{ {
void *tramp = __va(SMP_TRAMPOLINE_BASE);
extern volatile __u32 tramp_gdt_ptr; extern volatile __u32 tramp_gdt_ptr;
tramp_gdt_ptr = __pa_symbol(&cpu_gdt_table); tramp_gdt_ptr = __pa_symbol(&cpu_gdt_table);
memcpy(trampoline_base, trampoline_data, trampoline_end - trampoline_data); memcpy(tramp, trampoline_data, trampoline_end - trampoline_data);
return virt_to_phys(trampoline_base); return virt_to_phys(tramp);
}
/*
* We are called very early to get the low memory for the
* SMP bootup trampoline page.
*/
void __init smp_alloc_memory(void)
{
trampoline_base = (void *) alloc_bootmem_low_pages(PAGE_SIZE);
/*
* Has to be in very low memory so we can execute
* real-mode AP code.
*/
if (__pa(trampoline_base) >= 0x9F000)
BUG();
} }
/* /*
...@@ -174,6 +156,7 @@ static void __init synchronize_tsc_bp (void) ...@@ -174,6 +156,7 @@ static void __init synchronize_tsc_bp (void)
*/ */
atomic_inc(&tsc_count_start); atomic_inc(&tsc_count_start);
sync_core();
rdtscll(tsc_values[smp_processor_id()]); rdtscll(tsc_values[smp_processor_id()]);
/* /*
* We clear the TSC in the last loop: * We clear the TSC in the last loop:
...@@ -245,6 +228,7 @@ static void __init synchronize_tsc_ap (void) ...@@ -245,6 +228,7 @@ static void __init synchronize_tsc_ap (void)
atomic_inc(&tsc_count_start); atomic_inc(&tsc_count_start);
while (atomic_read(&tsc_count_start) != num_booting_cpus()) mb(); while (atomic_read(&tsc_count_start) != num_booting_cpus()) mb();
sync_core();
rdtscll(tsc_values[smp_processor_id()]); rdtscll(tsc_values[smp_processor_id()]);
if (i == NR_LOOPS-1) if (i == NR_LOOPS-1)
write_tsc(0, 0); write_tsc(0, 0);
...@@ -369,6 +353,9 @@ int __init start_secondary(void *unused) ...@@ -369,6 +353,9 @@ int __init start_secondary(void *unused)
cpu_init(); cpu_init();
smp_callin(); smp_callin();
/* otherwise gcc will move up the smp_processor_id before the cpu_init */
barrier();
Dprintk("cpu %d: waiting for commence\n", smp_processor_id()); Dprintk("cpu %d: waiting for commence\n", smp_processor_id());
while (!test_bit(smp_processor_id(), &smp_commenced_mask)) while (!test_bit(smp_processor_id(), &smp_commenced_mask))
rep_nop(); rep_nop();
...@@ -620,8 +607,6 @@ static void __init do_boot_cpu (int apicid) ...@@ -620,8 +607,6 @@ static void __init do_boot_cpu (int apicid)
*/ */
init_idle(idle,cpu); init_idle(idle,cpu);
x86_cpu_to_apicid[cpu] = apicid;
x86_apicid_to_cpu[apicid] = cpu;
idle->thread.rip = (unsigned long)start_secondary; idle->thread.rip = (unsigned long)start_secondary;
// idle->thread.rsp = (unsigned long)idle->thread_info + THREAD_SIZE - 512; // idle->thread.rsp = (unsigned long)idle->thread_info + THREAD_SIZE - 512;
...@@ -713,8 +698,6 @@ static void __init do_boot_cpu (int apicid) ...@@ -713,8 +698,6 @@ static void __init do_boot_cpu (int apicid)
} }
} }
if (boot_error) { if (boot_error) {
x86_cpu_to_apicid[cpu] = -1;
x86_apicid_to_cpu[apicid] = -1;
clear_bit(cpu, &cpu_callout_map); /* was set here (do_boot_cpu()) */ clear_bit(cpu, &cpu_callout_map); /* was set here (do_boot_cpu()) */
clear_bit(cpu, &cpu_initialized); /* was set by cpu_init() */ clear_bit(cpu, &cpu_initialized); /* was set by cpu_init() */
cpucount--; cpucount--;
...@@ -776,14 +759,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus) ...@@ -776,14 +759,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
{ {
int apicid, cpu; int apicid, cpu;
/*
* Initialize the logical to physical CPU number mapping
*/
for (apicid = 0; apicid < NR_CPUS; apicid++) {
x86_apicid_to_cpu[apicid] = -1;
}
/* /*
* Setup boot CPU information * Setup boot CPU information
*/ */
...@@ -791,8 +766,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus) ...@@ -791,8 +766,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
printk("CPU%d: ", 0); printk("CPU%d: ", 0);
print_cpu_info(&cpu_data[0]); print_cpu_info(&cpu_data[0]);
x86_apicid_to_cpu[boot_cpu_id] = 0;
x86_cpu_to_apicid[0] = boot_cpu_id;
current_thread_info()->cpu = 0; current_thread_info()->cpu = 0;
smp_tune_scheduling(); smp_tune_scheduling();
...@@ -837,6 +810,7 @@ static void __init smp_boot_cpus(unsigned int max_cpus) ...@@ -837,6 +810,7 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
io_apic_irqs = 0; io_apic_irqs = 0;
cpu_online_map = phys_cpu_present_map = 1; cpu_online_map = phys_cpu_present_map = 1;
phys_cpu_present_map = 1; phys_cpu_present_map = 1;
disable_apic = 1;
return; return;
} }
...@@ -851,6 +825,7 @@ static void __init smp_boot_cpus(unsigned int max_cpus) ...@@ -851,6 +825,7 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
io_apic_irqs = 0; io_apic_irqs = 0;
cpu_online_map = phys_cpu_present_map = 1; cpu_online_map = phys_cpu_present_map = 1;
phys_cpu_present_map = 1; phys_cpu_present_map = 1;
disable_apic = 1;
return; return;
} }
...@@ -878,13 +853,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus) ...@@ -878,13 +853,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
continue; continue;
do_boot_cpu(apicid); do_boot_cpu(apicid);
/*
* Make sure we unmap all failed CPUs
*/
if ((x86_apicid_to_cpu[apicid] == -1) &&
(phys_cpu_present_map & (1 << apicid)))
printk("phys CPU #%d not responding - cannot use it.\n",apicid);
} }
/* /*
......
...@@ -55,7 +55,6 @@ long sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigne ...@@ -55,7 +55,6 @@ long sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigne
if (!file) if (!file)
goto out; goto out;
} }
down_write(&current->mm->mmap_sem); down_write(&current->mm->mmap_sem);
error = do_mmap_pgoff(file, addr, len, prot, flags, off >> PAGE_SHIFT); error = do_mmap_pgoff(file, addr, len, prot, flags, off >> PAGE_SHIFT);
up_write(&current->mm->mmap_sem); up_write(&current->mm->mmap_sem);
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
* Copyright (c) 1996 Ingo Molnar * Copyright (c) 1996 Ingo Molnar
* Copyright (c) 1998 Andrea Arcangeli * Copyright (c) 1998 Andrea Arcangeli
* Copyright (c) 2002 Vojtech Pavlik * Copyright (c) 2002 Vojtech Pavlik
* Copyright (c) 2003 Andi Kleen
* *
*/ */
...@@ -25,9 +26,14 @@ ...@@ -25,9 +26,14 @@
#include <linux/bcd.h> #include <linux/bcd.h>
#include <asm/vsyscall.h> #include <asm/vsyscall.h>
#include <asm/timex.h> #include <asm/timex.h>
#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/apic.h>
#endif
u64 jiffies_64; u64 jiffies_64;
extern int using_apic_timer;
spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED; spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
extern int using_apic_timer; extern int using_apic_timer;
...@@ -56,12 +62,10 @@ struct timezone __sys_tz __section_sys_tz; ...@@ -56,12 +62,10 @@ struct timezone __sys_tz __section_sys_tz;
* together by xtime_lock. * together by xtime_lock.
*/ */
static spinlock_t time_offset_lock = SPIN_LOCK_UNLOCKED;
static unsigned long timeoffset = 0;
inline unsigned int do_gettimeoffset(void) inline unsigned int do_gettimeoffset(void)
{ {
unsigned long t; unsigned long t;
sync_core();
rdtscll(t); rdtscll(t);
return (t - hpet.last_tsc) * (1000000L / HZ) / hpet.ticks + hpet.offset; return (t - hpet.last_tsc) * (1000000L / HZ) / hpet.ticks + hpet.offset;
} }
...@@ -74,10 +78,9 @@ inline unsigned int do_gettimeoffset(void) ...@@ -74,10 +78,9 @@ inline unsigned int do_gettimeoffset(void)
void do_gettimeofday(struct timeval *tv) void do_gettimeofday(struct timeval *tv)
{ {
unsigned long flags, t, seq; unsigned long seq, t;
unsigned int sec, usec; unsigned int sec, usec;
spin_lock_irqsave(&time_offset_lock, flags);
do { do {
seq = read_seqbegin(&xtime_lock); seq = read_seqbegin(&xtime_lock);
...@@ -85,11 +88,9 @@ void do_gettimeofday(struct timeval *tv) ...@@ -85,11 +88,9 @@ void do_gettimeofday(struct timeval *tv)
usec = xtime.tv_nsec / 1000; usec = xtime.tv_nsec / 1000;
t = (jiffies - wall_jiffies) * (1000000L / HZ) + do_gettimeoffset(); t = (jiffies - wall_jiffies) * (1000000L / HZ) + do_gettimeoffset();
if (t > timeoffset) timeoffset = t; usec += t;
usec += timeoffset;
} while (read_seqretry(&xtime_lock, seq)); } while (read_seqretry(&xtime_lock, seq));
spin_unlock_irqrestore(&time_offset_lock, flags);
tv->tv_sec = sec + usec / 1000000; tv->tv_sec = sec + usec / 1000000;
tv->tv_usec = usec % 1000000; tv->tv_usec = usec % 1000000;
...@@ -104,7 +105,6 @@ void do_gettimeofday(struct timeval *tv) ...@@ -104,7 +105,6 @@ void do_gettimeofday(struct timeval *tv)
void do_settimeofday(struct timeval *tv) void do_settimeofday(struct timeval *tv)
{ {
write_seqlock_irq(&xtime_lock); write_seqlock_irq(&xtime_lock);
vxtime_lock();
tv->tv_usec -= do_gettimeoffset() + tv->tv_usec -= do_gettimeoffset() +
(jiffies - wall_jiffies) * tick_usec; (jiffies - wall_jiffies) * tick_usec;
...@@ -116,7 +116,6 @@ void do_settimeofday(struct timeval *tv) ...@@ -116,7 +116,6 @@ void do_settimeofday(struct timeval *tv)
xtime.tv_sec = tv->tv_sec; xtime.tv_sec = tv->tv_sec;
xtime.tv_nsec = (tv->tv_usec * 1000); xtime.tv_nsec = (tv->tv_usec * 1000);
vxtime_unlock();
time_adjust = 0; /* stop active adjtime() */ time_adjust = 0; /* stop active adjtime() */
time_status |= STA_UNSYNC; time_status |= STA_UNSYNC;
...@@ -207,11 +206,11 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) ...@@ -207,11 +206,11 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
*/ */
write_seqlock(&xtime_lock); write_seqlock(&xtime_lock);
vxtime_lock();
{ {
unsigned long t; unsigned long t;
sync_core();
rdtscll(t); rdtscll(t);
hpet.offset = (t - hpet.last_tsc) * (1000000L / HZ) / hpet.ticks + hpet.offset - 1000000L / HZ; hpet.offset = (t - hpet.last_tsc) * (1000000L / HZ) / hpet.ticks + hpet.offset - 1000000L / HZ;
if (hpet.offset >= 1000000L / HZ) if (hpet.offset >= 1000000L / HZ)
...@@ -219,7 +218,6 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) ...@@ -219,7 +218,6 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
hpet.ticks = min_t(long, max_t(long, (t - hpet.last_tsc) * (1000000L / HZ) / (1000000L / HZ - hpet.offset), hpet.ticks = min_t(long, max_t(long, (t - hpet.last_tsc) * (1000000L / HZ) / (1000000L / HZ - hpet.offset),
cpu_khz * 1000/HZ * 15 / 16), cpu_khz * 1000/HZ * 16 / 15); cpu_khz * 1000/HZ * 15 / 16), cpu_khz * 1000/HZ * 16 / 15);
hpet.last_tsc = t; hpet.last_tsc = t;
timeoffset = 0;
} }
/* /*
...@@ -255,7 +253,6 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) ...@@ -255,7 +253,6 @@ static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
rtc_update = xtime.tv_sec + 660; rtc_update = xtime.tv_sec + 660;
} }
vxtime_unlock();
write_sequnlock(&xtime_lock); write_sequnlock(&xtime_lock);
} }
...@@ -348,8 +345,9 @@ static unsigned int __init pit_calibrate_tsc(void) ...@@ -348,8 +345,9 @@ static unsigned int __init pit_calibrate_tsc(void)
outb((1193182 / (1000 / 50)) & 0xff, 0x42); outb((1193182 / (1000 / 50)) & 0xff, 0x42);
outb((1193182 / (1000 / 50)) >> 8, 0x42); outb((1193182 / (1000 / 50)) >> 8, 0x42);
rdtscll(start); rdtscll(start);
sync_core();
while ((inb(0x61) & 0x20) == 0); while ((inb(0x61) & 0x20) == 0);
sync_core();
rdtscll(end); rdtscll(end);
...@@ -382,12 +380,12 @@ void __init time_init(void) ...@@ -382,12 +380,12 @@ void __init time_init(void)
pit_init(); pit_init();
printk(KERN_INFO "time.c: Using 1.1931816 MHz PIT timer.\n"); printk(KERN_INFO "time.c: Using 1.1931816 MHz PIT timer.\n");
setup_irq(0, &irq0);
cpu_khz = pit_calibrate_tsc(); cpu_khz = pit_calibrate_tsc();
printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n", printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
cpu_khz / 1000, cpu_khz % 1000); cpu_khz / 1000, cpu_khz % 1000);
hpet.ticks = cpu_khz * (1000 / HZ); hpet.ticks = cpu_khz * (1000 / HZ);
rdtscll(hpet.last_tsc); rdtscll(hpet.last_tsc);
setup_irq(0, &irq0);
} }
__setup("report_lost_ticks", time_setup); __setup("report_lost_ticks", time_setup);
...@@ -77,6 +77,12 @@ extern int exception_trace; ...@@ -77,6 +77,12 @@ extern int exception_trace;
struct notifier_block *die_chain; struct notifier_block *die_chain;
static inline void conditional_sti(struct pt_regs *regs)
{
if (regs->eflags & X86_EFLAGS_IF)
local_irq_enable();
}
static int kstack_depth_to_print = 10; static int kstack_depth_to_print = 10;
#ifdef CONFIG_KALLSYMS #ifdef CONFIG_KALLSYMS
...@@ -128,8 +134,7 @@ void show_trace(unsigned long *stack) ...@@ -128,8 +134,7 @@ void show_trace(unsigned long *stack)
{ {
unsigned long addr; unsigned long addr;
unsigned long *irqstack, *irqstack_end, *estack_end; unsigned long *irqstack, *irqstack_end, *estack_end;
/* FIXME: should read the cpuid from the APIC; to still work with bogus %gs */ const int cpu = safe_smp_processor_id();
const int cpu = smp_processor_id();
int i; int i;
printk("\nCall Trace:"); printk("\nCall Trace:");
...@@ -210,7 +215,7 @@ void show_stack(unsigned long * rsp) ...@@ -210,7 +215,7 @@ void show_stack(unsigned long * rsp)
{ {
unsigned long *stack; unsigned long *stack;
int i; int i;
const int cpu = smp_processor_id(); const int cpu = safe_smp_processor_id();
unsigned long *irqstack_end = (unsigned long *) (cpu_pda[cpu].irqstackptr); unsigned long *irqstack_end = (unsigned long *) (cpu_pda[cpu].irqstackptr);
unsigned long *irqstack = (unsigned long *) (cpu_pda[cpu].irqstackptr - IRQSTACKSIZE); unsigned long *irqstack = (unsigned long *) (cpu_pda[cpu].irqstackptr - IRQSTACKSIZE);
...@@ -252,12 +257,7 @@ void show_registers(struct pt_regs *regs) ...@@ -252,12 +257,7 @@ void show_registers(struct pt_regs *regs)
int i; int i;
int in_kernel = (regs->cs & 3) == 0; int in_kernel = (regs->cs & 3) == 0;
unsigned long rsp; unsigned long rsp;
#ifdef CONFIG_SMP const int cpu = safe_smp_processor_id();
/* For SMP should get the APIC id here, just to protect against corrupted GS */
const int cpu = smp_processor_id();
#else
const int cpu = 0;
#endif
struct task_struct *cur = cpu_pda[cpu].pcurrent; struct task_struct *cur = cpu_pda[cpu].pcurrent;
rsp = regs->rsp; rsp = regs->rsp;
...@@ -330,7 +330,7 @@ void die(const char * str, struct pt_regs * regs, long err) ...@@ -330,7 +330,7 @@ void die(const char * str, struct pt_regs * regs, long err)
bust_spinlocks(1); bust_spinlocks(1);
handle_BUG(regs); handle_BUG(regs);
printk("%s: %04lx\n", str, err & 0xffff); printk("%s: %04lx\n", str, err & 0xffff);
cpu = smp_processor_id(); cpu = safe_smp_processor_id();
/* racy, but better than risking deadlock. */ /* racy, but better than risking deadlock. */
local_irq_disable(); local_irq_disable();
if (!spin_trylock(&die_lock)) { if (!spin_trylock(&die_lock)) {
...@@ -365,10 +365,12 @@ static inline unsigned long get_cr2(void) ...@@ -365,10 +365,12 @@ static inline unsigned long get_cr2(void)
static void do_trap(int trapnr, int signr, char *str, static void do_trap(int trapnr, int signr, char *str,
struct pt_regs * regs, long error_code, siginfo_t *info) struct pt_regs * regs, long error_code, siginfo_t *info)
{ {
conditional_sti(regs);
#ifdef CONFIG_CHECKING #ifdef CONFIG_CHECKING
{ {
unsigned long gs; unsigned long gs;
struct x8664_pda *pda = cpu_pda + stack_smp_processor_id(); struct x8664_pda *pda = cpu_pda + safe_smp_processor_id();
rdmsrl(MSR_GS_BASE, gs); rdmsrl(MSR_GS_BASE, gs);
if (gs != (unsigned long)pda) { if (gs != (unsigned long)pda) {
wrmsrl(MSR_GS_BASE, pda); wrmsrl(MSR_GS_BASE, pda);
...@@ -454,10 +456,12 @@ extern void dump_pagetable(unsigned long); ...@@ -454,10 +456,12 @@ extern void dump_pagetable(unsigned long);
asmlinkage void do_general_protection(struct pt_regs * regs, long error_code) asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
{ {
conditional_sti(regs);
#ifdef CONFIG_CHECKING #ifdef CONFIG_CHECKING
{ {
unsigned long gs; unsigned long gs;
struct x8664_pda *pda = cpu_pda + hard_smp_processor_id(); struct x8664_pda *pda = cpu_pda + safe_smp_processor_id();
rdmsrl(MSR_GS_BASE, gs); rdmsrl(MSR_GS_BASE, gs);
if (gs != (unsigned long)pda) { if (gs != (unsigned long)pda) {
wrmsrl(MSR_GS_BASE, pda); wrmsrl(MSR_GS_BASE, pda);
...@@ -565,7 +569,7 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code) ...@@ -565,7 +569,7 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code)
#ifdef CONFIG_CHECKING #ifdef CONFIG_CHECKING
{ {
unsigned long gs; unsigned long gs;
struct x8664_pda *pda = cpu_pda + stack_smp_processor_id(); struct x8664_pda *pda = cpu_pda + safe_smp_processor_id();
rdmsrl(MSR_GS_BASE, gs); rdmsrl(MSR_GS_BASE, gs);
if (gs != (unsigned long)pda) { if (gs != (unsigned long)pda) {
wrmsrl(MSR_GS_BASE, pda); wrmsrl(MSR_GS_BASE, pda);
...@@ -576,6 +580,8 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code) ...@@ -576,6 +580,8 @@ asmlinkage void do_debug(struct pt_regs * regs, long error_code)
asm("movq %%db6,%0" : "=r" (condition)); asm("movq %%db6,%0" : "=r" (condition));
conditional_sti(regs);
if (notify_die(DIE_DEBUG, "debug", regs, error_code) == NOTIFY_BAD) if (notify_die(DIE_DEBUG, "debug", regs, error_code) == NOTIFY_BAD)
return; return;
...@@ -636,7 +642,6 @@ void math_error(void *rip) ...@@ -636,7 +642,6 @@ void math_error(void *rip)
struct task_struct * task; struct task_struct * task;
siginfo_t info; siginfo_t info;
unsigned short cwd, swd; unsigned short cwd, swd;
/* /*
* Save the info for the exception handler and clear the error. * Save the info for the exception handler and clear the error.
*/ */
...@@ -688,6 +693,7 @@ void math_error(void *rip) ...@@ -688,6 +693,7 @@ void math_error(void *rip)
asmlinkage void do_coprocessor_error(struct pt_regs * regs, long error_code) asmlinkage void do_coprocessor_error(struct pt_regs * regs, long error_code)
{ {
conditional_sti(regs);
math_error((void *)regs->rip); math_error((void *)regs->rip);
} }
...@@ -747,6 +753,7 @@ static inline void simd_math_error(void *rip) ...@@ -747,6 +753,7 @@ static inline void simd_math_error(void *rip)
asmlinkage void do_simd_coprocessor_error(struct pt_regs * regs, asmlinkage void do_simd_coprocessor_error(struct pt_regs * regs,
long error_code) long error_code)
{ {
conditional_sti(regs);
simd_math_error((void *)regs->rip); simd_math_error((void *)regs->rip);
} }
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
* linux/arch/x86_64/kernel/vsyscall.c * linux/arch/x86_64/kernel/vsyscall.c
* *
* Copyright (C) 2001 Andrea Arcangeli <andrea@suse.de> SuSE * Copyright (C) 2001 Andrea Arcangeli <andrea@suse.de> SuSE
* Copyright 2003 Andi Kleen, SuSE Labs.
* *
* Thanks to hpa@transmeta.com for some useful hint. * Thanks to hpa@transmeta.com for some useful hint.
* Special thanks to Ingo Molnar for his early experience with * Special thanks to Ingo Molnar for his early experience with
...@@ -12,7 +13,8 @@ ...@@ -12,7 +13,8 @@
* vsyscalls. One vsyscall can reserve more than 1 slot to avoid * vsyscalls. One vsyscall can reserve more than 1 slot to avoid
* jumping out of line if necessary. * jumping out of line if necessary.
* *
* $Id: vsyscall.c,v 1.9 2002/03/21 13:42:58 ak Exp $ * Note: the concept clashes with user mode linux. If you use UML just
* set the kernel.vsyscall sysctl to 0.
*/ */
/* /*
...@@ -29,6 +31,9 @@ ...@@ -29,6 +31,9 @@
* broken programs will segfault and there's no security risk until we choose to * broken programs will segfault and there's no security risk until we choose to
* fix it. * fix it.
* *
* Add HPET support (port from 2.4). Still needed?
* Nop out vsyscall syscall to avoid anchor for buffer overflows when sysctl off.
*
* These are not urgent things that we need to address only before shipping the first * These are not urgent things that we need to address only before shipping the first
* production binary kernels. * production binary kernels.
*/ */
...@@ -37,6 +42,7 @@ ...@@ -37,6 +42,7 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/timer.h> #include <linux/timer.h>
#include <linux/seqlock.h>
#include <asm/vsyscall.h> #include <asm/vsyscall.h>
#include <asm/pgtable.h> #include <asm/pgtable.h>
...@@ -44,19 +50,13 @@ ...@@ -44,19 +50,13 @@
#include <asm/fixmap.h> #include <asm/fixmap.h>
#include <asm/errno.h> #include <asm/errno.h>
#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr))) #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
#define NO_VSYSCALL 1 int __sysctl_vsyscall __section_sysctl_vsyscall = 1;
seqlock_t __xtime_lock __section_xtime_lock = SEQLOCK_UNLOCKED;
#ifdef NO_VSYSCALL
#include <asm/unistd.h> #include <asm/unistd.h>
static int errno __section_vxtime_sequence;
static inline _syscall2(int,gettimeofday,struct timeval *,tv,struct timezone *,tz)
#else
static inline void timeval_normalize(struct timeval * tv) static inline void timeval_normalize(struct timeval * tv)
{ {
time_t __sec; time_t __sec;
...@@ -69,63 +69,60 @@ static inline void timeval_normalize(struct timeval * tv) ...@@ -69,63 +69,60 @@ static inline void timeval_normalize(struct timeval * tv)
} }
} }
long __vxtime_sequence[2] __section_vxtime_sequence;
static inline void do_vgettimeofday(struct timeval * tv) static inline void do_vgettimeofday(struct timeval * tv)
{ {
long sequence, t; long sequence, t;
unsigned long sec, usec; unsigned long sec, usec;
do { do {
sequence = __vxtime_sequence[1]; sequence = read_seqbegin(&__xtime_lock);
rmb();
sync_core();
rdtscll(t); rdtscll(t);
sec = __xtime.tv_sec; sec = __xtime.tv_sec;
usec = __xtime.tv_usec + usec = (__xtime.tv_nsec * 1000) +
(__jiffies - __wall_jiffies) * (1000000 / HZ) + (__jiffies - __wall_jiffies) * (1000000 / HZ) +
(t - __hpet.last_tsc) * (1000000 / HZ) / __hpet.ticks + __hpet.offset; (t - __hpet.last_tsc) * (1000000 / HZ) / __hpet.ticks + __hpet.offset;
rmb(); } while (read_seqretry(&__xtime_lock, sequence));
} while (sequence != __vxtime_sequence[0]);
tv->tv_sec = sec + usec / 1000000; tv->tv_sec = sec + usec / 1000000;
tv->tv_usec = usec % 1000000; tv->tv_usec = usec % 1000000;
} }
/* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
static inline void do_get_tz(struct timezone * tz) static inline void do_get_tz(struct timezone * tz)
{ {
long sequence;
do {
sequence = __vxtime_sequence[1];
rmb();
*tz = __sys_tz; *tz = __sys_tz;
}
rmb(); static inline int gettimeofday(struct timeval *tv, struct timezone *tz)
} while (sequence != __vxtime_sequence[0]); {
int ret;
asm volatile("syscall"
: "=a" (ret)
: "0" (__NR_gettimeofday),"D" (tv),"S" (tz) : __syscall_clobber );
return ret;
} }
#endif
static int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz) static int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
{ {
#ifdef NO_VSYSCALL if (unlikely(!__sysctl_vsyscall))
return gettimeofday(tv,tz); return gettimeofday(tv,tz);
#else
if (tv) if (tv)
do_vgettimeofday(tv); do_vgettimeofday(tv);
if (tz) if (tz)
do_get_tz(tz); do_get_tz(tz);
return 0; return 0;
#endif
} }
static time_t __vsyscall(1) vtime(time_t * t) static time_t __vsyscall(1) vtime(time_t * t)
{ {
struct timeval tv; struct timeval tv;
vgettimeofday(&tv,NULL); if (unlikely(!__sysctl_vsyscall))
gettimeofday(&tv, NULL);
else
do_vgettimeofday(&tv);
if (t) if (t)
*t = tv.tv_sec; *t = tv.tv_sec;
return tv.tv_sec; return tv.tv_sec;
...@@ -139,12 +136,13 @@ static long __vsyscall(2) venosys_0(void) ...@@ -139,12 +136,13 @@ static long __vsyscall(2) venosys_0(void)
static long __vsyscall(3) venosys_1(void) static long __vsyscall(3) venosys_1(void)
{ {
return -ENOSYS; return -ENOSYS;
} }
static void __init map_vsyscall(void) static void __init map_vsyscall(void)
{ {
extern char __vsyscall_0; extern char __vsyscall_0;
unsigned long physaddr_page0 = (unsigned long) &__vsyscall_0 - __START_KERNEL_map; unsigned long physaddr_page0 = __pa_symbol(&__vsyscall_0);
__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL); __set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
} }
......
/*
* ACPI S3 entry/exit handling.
*
* Notes:
* Relies on kernel being loaded below 4GB.
* Needs restore_low_mappings called before.
*
* Copyright 2003 by Andi Kleen, SuSE Labs.
*
* Long mode entry losely based on example code in chapter 14 of the x86-64 system
* programmer's manual.
*
* Notebook:
FIXME need to interface with suspend.c properly. do_magic. check i386. rename to suspend64.S
Need to fix vgacon,mtrr,bluesmoke to do resume
Interrupts should be off until the io-apic code has reinited the APIC.
Need support for that in the pm frame work or a special hack?
SMP support is non existent. Need to somehow restart the other CPUs again.
If CPU hotplug was working it could be used. Save/Restore needs to run on the same CPU.
Should check magic like i386 code
suspend code copies something. check what it is.
*/
#include <linux/linkage.h>
#include <asm/msr.h>
#include <asm/segment.h>
#include <asm/page.h>
#define O(x) (x-acpi_wakeup)
.text
.code16
ENTRY(acpi_wakeup)
/* 16bit real mode entered from ACPI BIOS */
/* The machine is just through BIOS setup after power down and everything set up
by Linux needs to be restored. */
/* The code here needs to be position independent or manually relocated,
because it is copied to a <1MB page for real mode execution */
/* A20 enabled (according to ACPI spec) */
/* cs = acpi_wakeup >> 4 ; eip = acpi_wakeup & 0xF */
movw %cs,%ax
movw %ax,%ds /* make %ds point to acpi_wakeup */
movw %ax,%ss
movw $O(wakeup_stack),%sp /* setup stack */
pushl $0
popfl /* clear EFLAGS */
lgdt %ds:O(pGDT) /* load kernel GDT */
movl $0x1,%eax /* enable protected mode */
movl %eax,%cr0
movl %ds:O(wakeup_page_table),%edi
ljmpl $__KERNEL16_CS,$0 /* -> s3_prot16 (filled in earlier by caller) */
/* patched by s3_restore_state below */
pGDT:
.short 0
.quad 0
.align 4
.globl wakeup_page_table
wakeup_page_table:
.long 0
.align 8
wakeup_stack:
.fill 128,1,0
.globl acpi_wakeup_end
acpi_wakeup_end:
/* end of real mode trampoline */
/* pointed to by __KERNEL16_CS:0 */
.code16
ENTRY(s3_prot16)
/* Now in 16bit protected mode, still no paging, stack/data segments invalid */
/* Prepare everything for 64bit paging, but still keep it turned off */
movl %cr4,%eax
bts $5,%eax /* set PAE bit */
movl %eax,%cr4
movl %edi,%cr3 /* load kernel page table */
movl $0x80000001,%eax
cpuid /* no execute supported ? */
movl %edx,%esi
movl $MSR_EFER,%ecx
rdmsr
bts $8,%eax /* long mode */
bt $20,%esi /* NX supported ? */
jnc 1f
bt $_EFER_NX,%eax
1:
wrmsr /* set temporary efer - real one is restored a bit later */
movl %cr0,%eax
bts $31,%eax /* paging */
movl %eax,%cr0
/* running in identity mapping now */
/* go to 64bit code segment */
ljmpl $__KERNEL_CS,$s3_restore_state-__START_KERNEL_map
.code64
.macro SAVEMSR msr,target
movl $\msr,%ecx
rdmsr
shlq $32,%rdx
orq %rax,%rdx
movq %rdx,\target(%rip)
.endm
.macro RESTMSR msr,src
movl $\msr,%ecx
movq \src(%rip),%rax
movq %rax,%rdx
shrq $32,%rdx
wrmsr
.endm
.macro SAVECTL reg
movq %\reg,%rax
movq %rax,saved_\reg(%rip)
.endm
.macro RESTCTL reg
movq saved_\reg(%rip),%rax
movq %rax,%\reg
.endm
/* Running in identity mapping, long mode */
s3_restore_state_low:
movq $s3_restore_state,%rax
jmpq *%rax
/* Running in real kernel mapping now */
s3_restore_state:
xorl %eax,%eax
movl %eax,%ds
movq saved_rsp(%rip),%rsp
movw saved_ss(%rip),%ss
movw saved_fs(%rip),%fs
movw saved_gs(%rip),%gs
movw saved_es(%rip),%es
movw saved_ds(%rip),%ds
lidt saved_idt
ltr saved_tr
lldt saved_ldt
/* gdt is already loaded */
RESTCTL cr0
RESTCTL cr4
/* cr3 is already loaded */
RESTMSR MSR_EFER,saved_efer
RESTMSR MSR_LSTAR,saved_lstar
RESTMSR MSR_CSTAR,saved_cstar
RESTMSR MSR_FS_BASE,saved_fs_base
RESTMSR MSR_GS_BASE,saved_gs_base
RESTMSR MSR_KERNEL_GS_BASE,saved_kernel_gs_base
RESTMSR MSR_SYSCALL_MASK,saved_syscall_mask
fxrstor fpustate(%rip)
RESTCTL dr0
RESTCTL dr1
RESTCTL dr2
RESTCTL dr3
RESTCTL dr6
RESTCTL dr7
movq saved_rflags(%rip),%rax
pushq %rax
popfq
movq saved_rbp(%rip),%rbp
movq saved_rbx(%rip),%rbx
movq saved_r12(%rip),%r12
movq saved_r13(%rip),%r13
movq saved_r14(%rip),%r14
movq saved_r15(%rip),%r15
ret
ENTRY(acpi_prepare_wakeup)
sgdt saved_gdt
/* copy gdt descr and page table to low level wakeup code so that it can
reload them early. */
movq acpi_wakeup_address(%rip),%rax
movw saved_gdt+8(%rip),%cx
movw %cx,O(pGDT)+8(%rax)
movq saved_gdt(%rip),%rcx
movq %rcx,O(pGDT)(%rax)
movq %cr3,%rdi
movl %edi,O(wakeup_page_table)(%rax)
ret
/* Save CPU state. */
/* Everything saved here needs to be restored above. */
ENTRY(do_suspend_lowlevel)
testl %edi,%edi
jnz s3_restore_state
SAVECTL cr0
SAVECTL cr4
SAVECTL cr3
str saved_tr
sidt saved_idt
sgdt saved_gdt
sldt saved_ldt
SAVEMSR MSR_EFER,saved_efer
SAVEMSR MSR_LSTAR,saved_lstar
SAVEMSR MSR_CSTAR,saved_cstar
SAVEMSR MSR_FS_BASE,saved_fs_base
SAVEMSR MSR_GS_BASE,saved_gs_base
SAVEMSR MSR_KERNEL_GS_BASE,saved_kernel_gs_base
SAVEMSR MSR_SYSCALL_MASK,saved_syscall_mask
movw %ds,saved_ds(%rip)
movw %es,saved_es(%rip)
movw %fs,saved_fs(%rip)
movw %gs,saved_gs(%rip)
movw %ss,saved_ss(%rip)
movq %rsp,saved_rsp(%rip)
pushfq
popq %rax
movq %rax,saved_rflags(%rip)
SAVECTL dr0
SAVECTL dr1
SAVECTL dr2
SAVECTL dr3
SAVECTL dr6
SAVECTL dr7
fxsave fpustate(%rip)
/* finally save callee saved registers */
movq %rbp,saved_rbp(%rip)
movq %rbx,saved_rbx(%rip)
movq %r12,saved_r12(%rip)
movq %r13,saved_r13(%rip)
movq %r14,saved_r14(%rip)
movq %r15,saved_r15(%rip)
movq $3,%rdi
call acpi_enter_sleep_state
ret /* should not happen */
.data
.align 8
saved_efer: .quad 0
saved_lstar: .quad 0
saved_cstar: .quad 0
saved_cr4: .quad 0
saved_cr3: .quad 0
saved_cr0: .quad 0
saved_rbp: .quad 0
saved_rbx: .quad 0
saved_rsp: .quad 0
saved_r12: .quad 0
saved_r13: .quad 0
saved_r14: .quad 0
saved_r15: .quad 0
saved_rflags: .quad 0
saved_gs_base: .quad 0
saved_fs_base: .quad 0
saved_kernel_gs_base: .quad 0
saved_syscall_mask: .quad 0
saved_dr0: .quad 0
saved_dr1: .quad 0
saved_dr2: .quad 0
saved_dr3: .quad 0
saved_dr6: .quad 0
saved_dr7: .quad 0
saved_ds: .short 0
saved_fs: .short 0
saved_gs: .short 0
saved_es: .short 0
saved_ss: .short 0
saved_idt: .short 0
.quad 0
saved_ldt: .short 0
saved_gdt: .short 0
.quad 0
saved_tr: .short 0
.align 16
fpustate: .fill 512,1,0
# #
# Makefile for the linux i386-specific parts of the memory manager. # Makefile for the linux x86_64-specific parts of the memory manager.
# #
obj-y := init.o fault.o ioremap.o extable.o pageattr.o obj-y := init.o fault.o ioremap.o extable.o pageattr.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_DISCONTIGMEM) += numa.o
obj-$(CONFIG_K8_NUMA) += k8topology.o
...@@ -121,7 +121,10 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ...@@ -121,7 +121,10 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code)
/* get the address */ /* get the address */
__asm__("movq %%cr2,%0":"=r" (address)); __asm__("movq %%cr2,%0":"=r" (address));
if (page_fault_trace) if (likely(regs->eflags & X86_EFLAGS_IF))
local_irq_enable();
if (unlikely(page_fault_trace))
printk("pagefault rip:%lx rsp:%lx cs:%lu ss:%lu address %lx error %lx\n", printk("pagefault rip:%lx rsp:%lx cs:%lu ss:%lu address %lx error %lx\n",
regs->rip,regs->rsp,regs->cs,regs->ss,address,error_code); regs->rip,regs->rsp,regs->cs,regs->ss,address,error_code);
...@@ -139,7 +142,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ...@@ -139,7 +142,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code)
* If we're in an interrupt or have no user * If we're in an interrupt or have no user
* context, we must not take the fault.. * context, we must not take the fault..
*/ */
if (in_atomic() || !mm) if (unlikely(in_atomic() || !mm))
goto no_context; goto no_context;
again: again:
...@@ -148,7 +151,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ...@@ -148,7 +151,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code)
vma = find_vma(mm, address); vma = find_vma(mm, address);
if (!vma) if (!vma)
goto bad_area; goto bad_area;
if (vma->vm_start <= address) if (likely(vma->vm_start <= address))
goto good_area; goto good_area;
if (!(vma->vm_flags & VM_GROWSDOWN)) if (!(vma->vm_flags & VM_GROWSDOWN))
goto bad_area; goto bad_area;
...@@ -222,7 +225,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ...@@ -222,7 +225,8 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code)
return; return;
} }
#endif #endif
printk("%s[%d] segfault at rip:%lx rsp:%lx adr:%lx err:%lx\n", printk(KERN_INFO
"%s[%d] segfault at rip:%lx rsp:%lx adr:%lx err:%lx\n",
tsk->comm, tsk->pid, regs->rip, regs->rsp, address, tsk->comm, tsk->pid, regs->rip, regs->rsp, address,
error_code); error_code);
......
...@@ -162,6 +162,37 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -162,6 +162,37 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
return i; return i;
} }
struct page *
follow_huge_addr(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address, int write)
{
return NULL;
}
struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
{
return NULL;
}
int pmd_huge(pmd_t pmd)
{
return !!(pmd_val(pmd) & _PAGE_PSE);
}
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
struct page *page;
page = pte_page(*(pte_t *)pmd);
if (page) {
page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
}
return page;
}
void free_huge_page(struct page *page) void free_huge_page(struct page *page)
{ {
BUG_ON(page_count(page)); BUG_ON(page_count(page));
...@@ -193,8 +224,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig ...@@ -193,8 +224,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
BUG_ON(start & (HPAGE_SIZE - 1)); BUG_ON(start & (HPAGE_SIZE - 1));
BUG_ON(end & (HPAGE_SIZE - 1)); BUG_ON(end & (HPAGE_SIZE - 1));
spin_lock(&htlbpage_lock);
spin_unlock(&htlbpage_lock);
for (address = start; address < end; address += HPAGE_SIZE) { for (address = start; address < end; address += HPAGE_SIZE) {
pte = huge_pte_offset(mm, address); pte = huge_pte_offset(mm, address);
page = pte_page(*pte); page = pte_page(*pte);
...@@ -216,7 +245,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne ...@@ -216,7 +245,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct inode = mapping->host; struct inode *inode = mapping->host;
unsigned long addr; unsigned long addr;
int ret = 0; int ret = 0;
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
* *
* Copyright (C) 1995 Linus Torvalds * Copyright (C) 1995 Linus Torvalds
* Copyright (C) 2000 Pavel Machek <pavel@suse.cz> * Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
* Copyright (C) 2002 Andi Kleen <ak@suse.de> * Copyright (C) 2002,2003 Andi Kleen <ak@suse.de>
*/ */
#include <linux/config.h> #include <linux/config.h>
...@@ -37,8 +37,9 @@ ...@@ -37,8 +37,9 @@
#include <asm/tlb.h> #include <asm/tlb.h>
#include <asm/mmu_context.h> #include <asm/mmu_context.h>
#include <asm/proto.h> #include <asm/proto.h>
#include <asm/smp.h>
unsigned long start_pfn, end_pfn; #define Dprintk(x...) printk(x)
struct mmu_gather mmu_gathers[NR_CPUS]; struct mmu_gather mmu_gathers[NR_CPUS];
...@@ -90,9 +91,11 @@ static void *spp_getpage(void) ...@@ -90,9 +91,11 @@ static void *spp_getpage(void)
if (after_bootmem) if (after_bootmem)
ptr = (void *) get_zeroed_page(GFP_ATOMIC); ptr = (void *) get_zeroed_page(GFP_ATOMIC);
else else
ptr = alloc_bootmem_low(PAGE_SIZE); ptr = alloc_bootmem_pages(PAGE_SIZE);
if (!ptr) if (!ptr || ((unsigned long)ptr & ~PAGE_MASK))
panic("set_pte_phys: cannot allocate page data %s\n", after_bootmem?"after bootmem":""); panic("set_pte_phys: cannot allocate page data %s\n", after_bootmem?"after bootmem":"");
Dprintk("spp_getpage %p\n", ptr);
return ptr; return ptr;
} }
...@@ -104,6 +107,8 @@ static void set_pte_phys(unsigned long vaddr, ...@@ -104,6 +107,8 @@ static void set_pte_phys(unsigned long vaddr,
pmd_t *pmd; pmd_t *pmd;
pte_t *pte; pte_t *pte;
Dprintk("set_pte_phys %lx to %lx\n", vaddr, phys);
level4 = pml4_offset_k(vaddr); level4 = pml4_offset_k(vaddr);
if (pml4_none(*level4)) { if (pml4_none(*level4)) {
printk("PML4 FIXMAP MISSING, it should be setup in head.S!\n"); printk("PML4 FIXMAP MISSING, it should be setup in head.S!\n");
...@@ -114,7 +119,7 @@ static void set_pte_phys(unsigned long vaddr, ...@@ -114,7 +119,7 @@ static void set_pte_phys(unsigned long vaddr,
pmd = (pmd_t *) spp_getpage(); pmd = (pmd_t *) spp_getpage();
set_pgd(pgd, __pgd(__pa(pmd) | _KERNPG_TABLE | _PAGE_USER)); set_pgd(pgd, __pgd(__pa(pmd) | _KERNPG_TABLE | _PAGE_USER));
if (pmd != pmd_offset(pgd, 0)) { if (pmd != pmd_offset(pgd, 0)) {
printk("PAGETABLE BUG #01!\n"); printk("PAGETABLE BUG #01! %p <-> %p\n", pmd, pmd_offset(pgd,0));
return; return;
} }
} }
...@@ -128,6 +133,7 @@ static void set_pte_phys(unsigned long vaddr, ...@@ -128,6 +133,7 @@ static void set_pte_phys(unsigned long vaddr,
} }
} }
pte = pte_offset_kernel(pmd, vaddr); pte = pte_offset_kernel(pmd, vaddr);
/* CHECKME: */
if (pte_val(*pte)) if (pte_val(*pte))
pte_ERROR(*pte); pte_ERROR(*pte);
set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, prot)); set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, prot));
...@@ -151,7 +157,8 @@ void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot) ...@@ -151,7 +157,8 @@ void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
set_pte_phys(address, phys, prot); set_pte_phys(address, phys, prot);
} }
extern unsigned long start_pfn, end_pfn; unsigned long __initdata table_start, table_end;
extern pmd_t temp_boot_pmds[]; extern pmd_t temp_boot_pmds[];
static struct temp_map { static struct temp_map {
...@@ -168,21 +175,21 @@ static __init void *alloc_low_page(int *index, unsigned long *phys) ...@@ -168,21 +175,21 @@ static __init void *alloc_low_page(int *index, unsigned long *phys)
{ {
struct temp_map *ti; struct temp_map *ti;
int i; int i;
unsigned long pfn = start_pfn++, paddr; unsigned long pfn = table_end++, paddr;
void *adr; void *adr;
if (pfn >= end_pfn_map) if (pfn >= end_pfn)
panic("alloc_low_page: ran out of memory"); panic("alloc_low_page: ran out of memory");
for (i = 0; temp_mappings[i].allocated; i++) { for (i = 0; temp_mappings[i].allocated; i++) {
if (!temp_mappings[i].pmd) if (!temp_mappings[i].pmd)
panic("alloc_low_page: ran out of temp mappings"); panic("alloc_low_page: ran out of temp mappings");
} }
ti = &temp_mappings[i]; ti = &temp_mappings[i];
paddr = (pfn & (~511)) << PAGE_SHIFT; paddr = (pfn << PAGE_SHIFT) & PMD_MASK;
set_pmd(ti->pmd, __pmd(paddr | _KERNPG_TABLE | _PAGE_PSE)); set_pmd(ti->pmd, __pmd(paddr | _KERNPG_TABLE | _PAGE_PSE));
ti->allocated = 1; ti->allocated = 1;
__flush_tlb(); __flush_tlb();
adr = ti->address + (pfn & 511)*PAGE_SIZE; adr = ti->address + ((pfn << PAGE_SHIFT) & ~PMD_MASK);
*index = i; *index = i;
*phys = pfn * PAGE_SIZE; *phys = pfn * PAGE_SIZE;
return adr; return adr;
...@@ -203,20 +210,26 @@ static void __init phys_pgd_init(pgd_t *pgd, unsigned long address, unsigned lon ...@@ -203,20 +210,26 @@ static void __init phys_pgd_init(pgd_t *pgd, unsigned long address, unsigned lon
pgd = pgd + i; pgd = pgd + i;
for (; i < PTRS_PER_PGD; pgd++, i++) { for (; i < PTRS_PER_PGD; pgd++, i++) {
int map; int map;
unsigned long paddr = i*PGDIR_SIZE, pmd_phys; unsigned long paddr, pmd_phys;
pmd_t *pmd; pmd_t *pmd;
paddr = (address & PML4_MASK) + i*PGDIR_SIZE;
if (paddr >= end) { if (paddr >= end) {
for (; i < PTRS_PER_PGD; i++, pgd++) for (; i < PTRS_PER_PGD; i++, pgd++)
set_pgd(pgd, __pgd(0)); set_pgd(pgd, __pgd(0));
break; break;
} }
if (!e820_mapped(paddr, paddr+PGDIR_SIZE, 0)) {
set_pgd(pgd, __pgd(0));
continue;
}
pmd = alloc_low_page(&map, &pmd_phys); pmd = alloc_low_page(&map, &pmd_phys);
set_pgd(pgd, __pgd(pmd_phys | _KERNPG_TABLE)); set_pgd(pgd, __pgd(pmd_phys | _KERNPG_TABLE));
for (j = 0; j < PTRS_PER_PMD; pmd++, j++) { for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
unsigned long pe; unsigned long pe;
paddr = i*PGDIR_SIZE + j*PMD_SIZE;
if (paddr >= end) { if (paddr >= end) {
for (; j < PTRS_PER_PMD; j++, pmd++) for (; j < PTRS_PER_PMD; j++, pmd++)
set_pmd(pmd, __pmd(0)); set_pmd(pmd, __pmd(0));
...@@ -239,13 +252,37 @@ void __init init_memory_mapping(void) ...@@ -239,13 +252,37 @@ void __init init_memory_mapping(void)
unsigned long adr; unsigned long adr;
unsigned long end; unsigned long end;
unsigned long next; unsigned long next;
unsigned long pgds, pmds, tables;
Dprintk("init_memory_mapping\n");
end = end_pfn_map << PAGE_SHIFT;
/*
* Find space for the kernel direct mapping tables.
* Later we should allocate these tables in the local node of the memory
* mapped. Unfortunately this is done currently before the nodes are
* discovered.
*/
pgds = (end + PGDIR_SIZE - 1) >> PGDIR_SHIFT;
pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
tables = round_up(pgds*8, PAGE_SIZE) + round_up(pmds * 8, PAGE_SIZE);
table_start = find_e820_area(0x8000, __pa_symbol(&_text), tables);
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");
table_start >>= PAGE_SHIFT;
table_end = table_start;
end += __PAGE_OFFSET; /* turn virtual */
end = PAGE_OFFSET + (end_pfn_map * PAGE_SIZE);
for (adr = PAGE_OFFSET; adr < end; adr = next) { for (adr = PAGE_OFFSET; adr < end; adr = next) {
int map; int map;
unsigned long pgd_phys; unsigned long pgd_phys;
pgd_t *pgd = alloc_low_page(&map, &pgd_phys); pgd_t *pgd = alloc_low_page(&map, &pgd_phys);
next = adr + (512UL * 1024 * 1024 * 1024); next = adr + PML4_SIZE;
if (next > end) if (next > end)
next = end; next = end;
phys_pgd_init(pgd, adr-PAGE_OFFSET, next-PAGE_OFFSET); phys_pgd_init(pgd, adr-PAGE_OFFSET, next-PAGE_OFFSET);
...@@ -254,20 +291,35 @@ void __init init_memory_mapping(void) ...@@ -254,20 +291,35 @@ void __init init_memory_mapping(void)
} }
asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features)); asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features));
__flush_tlb_all(); __flush_tlb_all();
early_printk("kernel direct mapping tables upto %lx @ %lx-%lx\n", end,
table_start<<PAGE_SHIFT,
table_end<<PAGE_SHIFT);
} }
extern struct x8664_pda cpu_pda[NR_CPUS]; extern struct x8664_pda cpu_pda[NR_CPUS];
void __init zap_low_mappings (void) static unsigned long low_pml4[NR_CPUS];
void swap_low_mappings(void)
{ {
int i; int i;
for (i = 0; i < NR_CPUS; i++) { for (i = 0; i < NR_CPUS; i++) {
if (cpu_pda[i].level4_pgt) unsigned long t;
cpu_pda[i].level4_pgt[0] = 0; if (!cpu_pda[i].level4_pgt)
continue;
t = cpu_pda[i].level4_pgt[0];
cpu_pda[i].level4_pgt[0] = low_pml4[i];
low_pml4[i] = t;
} }
flush_tlb_all(); flush_tlb_all();
} }
void zap_low_mappings(void)
{
swap_low_mappings();
}
#ifndef CONFIG_DISCONTIGMEM
void __init paging_init(void) void __init paging_init(void)
{ {
{ {
...@@ -286,7 +338,7 @@ void __init paging_init(void) ...@@ -286,7 +338,7 @@ void __init paging_init(void)
} }
return; return;
} }
#endif
static inline int page_is_ram (unsigned long pagenr) static inline int page_is_ram (unsigned long pagenr)
{ {
...@@ -315,36 +367,45 @@ void __init mem_init(void) ...@@ -315,36 +367,45 @@ void __init mem_init(void)
int codesize, reservedpages, datasize, initsize; int codesize, reservedpages, datasize, initsize;
int tmp; int tmp;
if (!mem_map) /* How many end-of-memory variables you have, grandma! */
BUG();
max_low_pfn = end_pfn; max_low_pfn = end_pfn;
max_pfn = end_pfn; max_pfn = end_pfn;
max_mapnr = num_physpages = end_pfn; num_physpages = end_pfn;
high_memory = (void *) __va(end_pfn * PAGE_SIZE); high_memory = (void *) __va(end_pfn * PAGE_SIZE);
/* clear the zero-page */ /* clear the zero-page */
memset(empty_zero_page, 0, PAGE_SIZE); memset(empty_zero_page, 0, PAGE_SIZE);
reservedpages = 0;
/* this will put all low memory onto the freelists */ /* this will put all low memory onto the freelists */
totalram_pages += free_all_bootmem(); #ifdef CONFIG_DISCONTIGMEM
totalram_pages += numa_free_all_bootmem();
tmp = 0;
/* should count reserved pages here for all nodes */
#else
max_mapnr = end_pfn;
if (!mem_map) BUG();
after_bootmem = 1; totalram_pages += free_all_bootmem();
reservedpages = 0;
for (tmp = 0; tmp < end_pfn; tmp++) for (tmp = 0; tmp < end_pfn; tmp++)
/* /*
* Only count reserved RAM pages * Only count reserved RAM pages
*/ */
if (page_is_ram(tmp) && PageReserved(mem_map+tmp)) if (page_is_ram(tmp) && PageReserved(mem_map+tmp))
reservedpages++; reservedpages++;
#endif
after_bootmem = 1;
codesize = (unsigned long) &_etext - (unsigned long) &_text; codesize = (unsigned long) &_etext - (unsigned long) &_text;
datasize = (unsigned long) &_edata - (unsigned long) &_etext; datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin; initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;
printk("Memory: %luk/%luk available (%dk kernel code, %dk reserved, %dk data, %dk init)\n", printk("Memory: %luk/%luk available (%dk kernel code, %dk reserved, %dk data, %dk init)\n",
(unsigned long) nr_free_pages() << (PAGE_SHIFT-10), (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
max_mapnr << (PAGE_SHIFT-10), end_pfn << (PAGE_SHIFT-10),
codesize >> 10, codesize >> 10,
reservedpages << (PAGE_SHIFT-10), reservedpages << (PAGE_SHIFT-10),
datasize >> 10, datasize >> 10,
...@@ -392,3 +453,16 @@ void free_initrd_mem(unsigned long start, unsigned long end) ...@@ -392,3 +453,16 @@ void free_initrd_mem(unsigned long start, unsigned long end)
} }
} }
#endif #endif
void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
{
/* Should check here against the e820 map to avoid double free */
#ifdef CONFIG_DISCONTIGMEM
int nid = phys_to_nid(phys);
if (phys < HIGH_MEMORY && nid)
panic("reserve of %lx at node %d", phys, nid);
reserve_bootmem_node(NODE_DATA(nid), phys, len);
#else
reserve_bootmem(phys, len);
#endif
}
...@@ -133,14 +133,16 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag ...@@ -133,14 +133,16 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag
*/ */
if (phys_addr < virt_to_phys(high_memory)) { if (phys_addr < virt_to_phys(high_memory)) {
char *t_addr, *t_end; char *t_addr, *t_end;
struct page *page;
t_addr = __va(phys_addr); t_addr = __va(phys_addr);
t_end = t_addr + (size - 1); t_end = t_addr + (size - 1);
#ifndef CONFIG_DISCONTIGMEM
struct page *page;
for(page = virt_to_page(t_addr); page <= virt_to_page(t_end); page++) for(page = virt_to_page(t_addr); page <= virt_to_page(t_end); page++)
if(!PageReserved(page)) if(!PageReserved(page))
return NULL; return NULL;
#endif
} }
/* /*
......
/*
* AMD K8 NUMA support.
* Discover the memory map and associated nodes.
*
* Doesn't use the ACPI SRAT table because it has a questionable license.
* Instead the northbridge registers are read directly.
* XXX in 2.5 we could use the generic SRAT code
*
* Copyright 2002,2003 Andi Kleen, SuSE Labs.
*/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/string.h>
#include <linux/module.h>
#include <asm/io.h>
#include <linux/pci_ids.h>
#include <asm/types.h>
#include <asm/mmzone.h>
#include <asm/proto.h>
#include <asm/e820.h>
#include <asm/pci-direct.h>
#include <asm/numa.h>
static int find_northbridge(void)
{
int num;
for (num = 0; num < 32; num++) {
u32 header;
header = read_pci_config(0, num, 0, 0x00);
if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)))
continue;
header = read_pci_config(0, num, 1, 0x00);
if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)))
continue;
return num;
}
return -1;
}
int __init k8_scan_nodes(unsigned long start, unsigned long end)
{
unsigned long prevbase;
struct node nodes[MAXNODE];
int nodeid, numnodes, maxnode, i, nb;
nb = find_northbridge();
if (nb < 0)
return nb;
printk(KERN_INFO "Scanning NUMA topology in Northbridge %d\n", nb);
numnodes = (read_pci_config(0, nb, 0, 0x60 ) >> 4) & 3;
memset(&nodes,0,sizeof(nodes));
prevbase = 0;
maxnode = -1;
for (i = 0; i < MAXNODE; i++) {
unsigned long base,limit;
base = read_pci_config(0, nb, 1, 0x40 + i*8);
limit = read_pci_config(0, nb, 1, 0x44 + i*8);
nodeid = limit & 3;
if (!limit) {
printk(KERN_INFO "Skipping node entry %d (base %lx)\n", i, base);
continue;
}
if ((base >> 8) & 3 || (limit >> 8) & 3) {
printk(KERN_ERR "Node %d using interleaving mode %lx/%lx\n",
nodeid, (base>>8)&3, (limit>>8) & 3);
return -1;
}
if (nodeid > maxnode)
maxnode = nodeid;
if ((1UL << nodeid) & nodes_present) {
printk("Node %d already present. Skipping\n", nodeid);
continue;
}
limit >>= 16;
limit <<= 24;
if (limit > end_pfn_map << PAGE_SHIFT)
limit = end_pfn_map << PAGE_SHIFT;
if (limit <= base) {
printk(KERN_INFO "Node %d beyond memory map\n", nodeid);
continue;
}
base >>= 16;
base <<= 24;
if (base < start)
base = start;
if (limit > end)
limit = end;
if (limit == base)
continue;
if (limit < base) {
printk(KERN_INFO"Node %d bogus settings %lx-%lx. Ignored.\n",
nodeid, base, limit);
continue;
}
/* Could sort here, but pun for now. Should not happen anyroads. */
if (prevbase > base) {
printk(KERN_INFO "Node map not sorted %lx,%lx\n",
prevbase,base);
return -1;
}
printk(KERN_INFO "Node %d MemBase %016lx Limit %016lx\n",
nodeid, base, limit);
nodes[nodeid].start = base;
nodes[nodeid].end = limit;
prevbase = base;
}
if (maxnode <= 0)
return -1;
memnode_shift = compute_hash_shift(nodes,maxnode,end);
if (memnode_shift < 0) {
printk(KERN_ERR "No NUMA node hash function found. Contact maintainer\n");
return -1;
}
printk(KERN_INFO "Using node hash shift of %d\n", memnode_shift);
early_for_all_nodes(i) {
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
}
return 0;
}
/*
* Generic VM initialization for x86-64 NUMA setups.
* Copyright 2002,2003 Andi Kleen, SuSE Labs.
*/
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/init.h>
#include <linux/bootmem.h>
#include <linux/mmzone.h>
#include <linux/blk.h>
#include <linux/ctype.h>
#include <asm/e820.h>
#include <asm/proto.h>
#include <asm/dma.h>
#include <asm/numa.h>
#define Dprintk(x...) printk(x)
struct pglist_data *node_data[MAXNODE];
bootmem_data_t plat_node_bdata[MAX_NUMNODES];
int memnode_shift;
u8 memnodemap[NODEMAPSIZE];
static int numa_off __initdata;
unsigned long nodes_present;
int maxnode;
static int emunodes __initdata;
int compute_hash_shift(struct node *nodes, int numnodes, u64 maxmem)
{
int i;
int shift = 24;
u64 addr;
/* When in doubt use brute force. */
while (shift < 48) {
memset(memnodemap,0xff,sizeof(*memnodemap) * NODEMAPSIZE);
early_for_all_nodes (i) {
for (addr = nodes[i].start;
addr < nodes[i].end;
addr += (1UL << shift)) {
if (memnodemap[addr >> shift] != 0xff) {
printk("node %d shift %d addr %Lx conflict %d\n",
i, shift, addr, memnodemap[addr>>shift]);
goto next;
}
memnodemap[addr >> shift] = i;
}
}
return shift;
next:
shift++;
}
memset(memnodemap,0,sizeof(*memnodemap) * NODEMAPSIZE);
return -1;
}
/* Initialize bootmem allocator for a node */
void __init setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
{
unsigned long start_pfn, end_pfn, bootmap_pages, bootmap_size, bootmap_start;
unsigned long nodedata_phys;
const int pgdat_size = round_up(sizeof(pg_data_t), PAGE_SIZE);
start = round_up(start, ZONE_ALIGN);
printk("Bootmem setup node %d %016lx-%016lx\n", nodeid, start, end);
start_pfn = start >> PAGE_SHIFT;
end_pfn = end >> PAGE_SHIFT;
nodedata_phys = find_e820_area(start, end, pgdat_size);
if (nodedata_phys == -1L)
panic("Cannot find memory pgdat in node %d\n", nodeid);
Dprintk("nodedata_phys %lx\n", nodedata_phys);
node_data[nodeid] = phys_to_virt(nodedata_phys);
memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));
NODE_DATA(nodeid)->bdata = &plat_node_bdata[nodeid];
NODE_DATA(nodeid)->node_start_pfn = start_pfn;
NODE_DATA(nodeid)->node_size = end_pfn - start_pfn;
/* Find a place for the bootmem map */
bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
bootmap_start = round_up(nodedata_phys + pgdat_size, PAGE_SIZE);
bootmap_start = find_e820_area(bootmap_start, end, bootmap_pages<<PAGE_SHIFT);
if (bootmap_start == -1L)
panic("Not enough continuous space for bootmap on node %d", nodeid);
Dprintk("bootmap start %lu pages %lu\n", bootmap_start, bootmap_pages);
bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
bootmap_start >> PAGE_SHIFT,
start_pfn, end_pfn);
e820_bootmem_free(NODE_DATA(nodeid), start, end);
reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size);
reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT);
if (nodeid > maxnode)
maxnode = nodeid;
nodes_present |= (1UL << nodeid);
}
/* Initialize final allocator for a zone */
void __init setup_node_zones(int nodeid)
{
unsigned long start_pfn, end_pfn;
unsigned long zones[MAX_NR_ZONES];
unsigned long dma_end_pfn;
memset(zones, 0, sizeof(unsigned long) * MAX_NR_ZONES);
start_pfn = node_start_pfn(nodeid);
end_pfn = node_end_pfn(nodeid);
printk("setting up node %d %lx-%lx\n", nodeid, start_pfn, end_pfn);
/* All nodes > 0 have a zero length zone DMA */
dma_end_pfn = __pa(MAX_DMA_ADDRESS) >> PAGE_SHIFT;
if (start_pfn < dma_end_pfn) {
zones[ZONE_DMA] = dma_end_pfn - start_pfn;
zones[ZONE_NORMAL] = end_pfn - dma_end_pfn;
} else {
zones[ZONE_NORMAL] = end_pfn - start_pfn;
}
free_area_init_node(nodeid, NODE_DATA(nodeid), NULL, zones,
start_pfn, NULL);
}
int fake_node;
int __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
{
#ifdef CONFIG_K8_NUMA
if (!numa_off && !k8_scan_nodes(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT))
return 0;
#endif
printk(KERN_INFO "%s\n",
numa_off ? "NUMA turned off" : "No NUMA configuration found");
if (!numa_off && emunodes > 0) {
struct node nodes[MAXNODE];
unsigned long nodesize = (end_pfn << PAGE_SHIFT) / emunodes;
int i;
if (emunodes > MAXNODE)
emunodes = MAXNODE;
printk(KERN_INFO "Faking %d nodes of size %ld MB\n", emunodes, nodesize>>20);
for (i = 0; i < emunodes; i++) {
unsigned long end = (i+1)*nodesize;
if (i == emunodes-1)
end = end_pfn << PAGE_SHIFT;
nodes[i].start = i * nodesize;
nodes[i].end = end;
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
}
memnode_shift = compute_hash_shift(nodes, emunodes, nodes[i-1].end);
return 0;
}
printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
start_pfn << PAGE_SHIFT,
end_pfn << PAGE_SHIFT);
/* setup dummy node covering all memory */
fake_node = 1;
memnode_shift = 63;
memnodemap[0] = 0;
setup_node_bootmem(0, start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);
return -1;
}
unsigned long __init numa_free_all_bootmem(void)
{
int i;
unsigned long pages = 0;
for_all_nodes(i) {
pages += free_all_bootmem_node(NODE_DATA(i));
}
return pages;
}
void __init paging_init(void)
{
int i;
for_all_nodes(i) {
setup_node_zones(i);
}
}
/* [numa=off] */
/* [numa=emunodes] */
__init int numa_setup(char *opt)
{
if (!strncmp(opt,"off",3))
numa_off = 1;
if (isdigit(opt[0]))
emunodes = simple_strtoul(opt, NULL, 10);
return 1;
}
...@@ -47,10 +47,10 @@ SECTIONS ...@@ -47,10 +47,10 @@ SECTIONS
.vsyscall_0 -10*1024*1024: AT ((LOADADDR(.data.cacheline_aligned) + SIZEOF(.data.cacheline_aligned) + 4095) & ~(4095)) { *(.vsyscall_0) } .vsyscall_0 -10*1024*1024: AT ((LOADADDR(.data.cacheline_aligned) + SIZEOF(.data.cacheline_aligned) + 4095) & ~(4095)) { *(.vsyscall_0) }
__vsyscall_0 = LOADADDR(.vsyscall_0); __vsyscall_0 = LOADADDR(.vsyscall_0);
. = ALIGN(64); . = ALIGN(64);
.vxtime_sequence : AT ((LOADADDR(.vsyscall_0) + SIZEOF(.vsyscall_0) + 63) & ~(63)) { *(.vxtime_sequence) } .xtime_lock : AT ((LOADADDR(.vsyscall_0) + SIZEOF(.vsyscall_0) + 63) & ~(63)) { *(.xtime_lock) }
vxtime_sequence = LOADADDR(.vxtime_sequence); xtime_lock = LOADADDR(.xtime_lock);
. = ALIGN(16); . = ALIGN(16);
.hpet : AT ((LOADADDR(.vxtime_sequence) + SIZEOF(.vxtime_sequence) + 15) & ~(15)) { *(.hpet) } .hpet : AT ((LOADADDR(.xtime_lock) + SIZEOF(.xtime_lock) + 15) & ~(15)) { *(.hpet) }
hpet = LOADADDR(.hpet); hpet = LOADADDR(.hpet);
. = ALIGN(16); . = ALIGN(16);
.wall_jiffies : AT ((LOADADDR(.hpet) + SIZEOF(.hpet) + 15) & ~(15)) { *(.wall_jiffies) } .wall_jiffies : AT ((LOADADDR(.hpet) + SIZEOF(.hpet) + 15) & ~(15)) { *(.wall_jiffies) }
...@@ -59,7 +59,10 @@ SECTIONS ...@@ -59,7 +59,10 @@ SECTIONS
.sys_tz : AT ((LOADADDR(.wall_jiffies) + SIZEOF(.wall_jiffies) + 15) & ~(15)) { *(.sys_tz) } .sys_tz : AT ((LOADADDR(.wall_jiffies) + SIZEOF(.wall_jiffies) + 15) & ~(15)) { *(.sys_tz) }
sys_tz = LOADADDR(.sys_tz); sys_tz = LOADADDR(.sys_tz);
. = ALIGN(16); . = ALIGN(16);
.jiffies : AT ((LOADADDR(.sys_tz) + SIZEOF(.sys_tz) + 15) & ~(15)) { *(.jiffies) } .sysctl_vsyscall : AT ((LOADADDR(.sys_tz) + SIZEOF(.sys_tz) + 15) & ~(15)) { *(.sysctl_vsyscall) }
sysctl_vsyscall = LOADADDR(.sysctl_vsyscall);
. = ALIGN(16);
.jiffies : AT ((LOADADDR(.sysctl_vsyscall) + SIZEOF(.sysctl_vsyscall) + 15) & ~(15)) { *(.jiffies) }
jiffies = LOADADDR(.jiffies); jiffies = LOADADDR(.jiffies);
. = ALIGN(16); . = ALIGN(16);
.xtime : AT ((LOADADDR(.jiffies) + SIZEOF(.jiffies) + 15) & ~(15)) { *(.xtime) } .xtime : AT ((LOADADDR(.jiffies) + SIZEOF(.jiffies) + 15) & ~(15)) { *(.xtime) }
......
...@@ -48,8 +48,8 @@ ...@@ -48,8 +48,8 @@
#define ACPI_ASM_MACROS #define ACPI_ASM_MACROS
#define BREAKPOINT3 #define BREAKPOINT3
#define ACPI_DISABLE_IRQS() __cli() #define ACPI_DISABLE_IRQS() local_irq_disable()
#define ACPI_ENABLE_IRQS() __sti() #define ACPI_ENABLE_IRQS() local_irq_enable()
#define ACPI_FLUSH_CPU_CACHE() wbinvd() #define ACPI_FLUSH_CPU_CACHE() wbinvd()
/* /*
...@@ -142,6 +142,10 @@ extern void acpi_reserve_bootmem(void); ...@@ -142,6 +142,10 @@ extern void acpi_reserve_bootmem(void);
extern int acpi_disabled; extern int acpi_disabled;
#define dmi_broken (0)
#define BROKEN_ACPI_Sx 0x0001
#define BROKEN_INIT_AFTER_S1 0x0002
#endif /*__KERNEL__*/ #endif /*__KERNEL__*/
#endif /*_ASM_ACPI_H*/ #endif /*_ASM_ACPI_H*/
...@@ -12,15 +12,6 @@ static inline struct task_struct *get_current(void) ...@@ -12,15 +12,6 @@ static inline struct task_struct *get_current(void)
return t; return t;
} }
#define stack_current() \
({ \
struct thread_info *ti; \
__asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (~8191UL)); \
ti->task; \
})
#define current get_current() #define current get_current()
#else #else
......
...@@ -135,6 +135,14 @@ static inline void set_ldt_desc(unsigned cpu, void *addr, int size) ...@@ -135,6 +135,14 @@ static inline void set_ldt_desc(unsigned cpu, void *addr, int size)
DESC_LDT, size); DESC_LDT, size);
} }
static inline void set_seg_base(unsigned cpu, int entry, void *base)
{
struct desc_struct *d = &cpu_gdt_table[cpu][entry];
d->base0 = PTR_LOW(base);
d->base1 = PTR_MIDDLE(base);
d->base2 = PTR_HIGH(base);
}
#define LDT_entry_a(info) \ #define LDT_entry_a(info) \
((((info)->base_addr & 0x0000ffff) << 16) | ((info)->limit & 0x0ffff)) ((((info)->base_addr & 0x0000ffff) << 16) | ((info)->limit & 0x0ffff))
#define LDT_entry_b(info) \ #define LDT_entry_b(info) \
......
#ifndef _ASM_X8664_DMA_MAPPING_H #ifndef _X8664_DMA_MAPPING_H
#define _ASM_X8664_DMA_MAPPING_H #define _X8664_DMA_MAPPING_H 1
#include <asm-generic/dma-mapping.h> #include <asm-generic/dma-mapping.h>
......
...@@ -47,7 +47,7 @@ extern void add_memory_region(unsigned long start, unsigned long size, ...@@ -47,7 +47,7 @@ extern void add_memory_region(unsigned long start, unsigned long size,
int type); int type);
extern void setup_memory_region(void); extern void setup_memory_region(void);
extern void contig_e820_setup(void); extern void contig_e820_setup(void);
extern void e820_end_of_ram(void); extern unsigned long e820_end_of_ram(void);
extern void e820_reserve_resources(void); extern void e820_reserve_resources(void);
extern void e820_print_map(char *who); extern void e820_print_map(char *who);
extern int e820_mapped(unsigned long start, unsigned long end, int type); extern int e820_mapped(unsigned long start, unsigned long end, int type);
......
...@@ -39,16 +39,10 @@ static inline int need_signal_i387(struct task_struct *me) ...@@ -39,16 +39,10 @@ static inline int need_signal_i387(struct task_struct *me)
#define kernel_fpu_end() stts() #define kernel_fpu_end() stts()
#define unlazy_fpu(tsk) do { \ #define unlazy_fpu(tsk) do { \
if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) \ if ((tsk)->thread_info->flags & TIF_USEDFPU) \
save_init_fpu(tsk); \ save_init_fpu(tsk); \
} while (0) } while (0)
#define unlazy_current_fpu() do { \
if (test_thread_flag(TIF_USEDFPU)) \
save_init_fpu(tsk); \
} while (0)
#define clear_fpu(tsk) do { \ #define clear_fpu(tsk) do { \
if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) { \ if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) { \
asm volatile("fwait"); \ asm volatile("fwait"); \
...@@ -134,7 +128,7 @@ static inline void save_init_fpu( struct task_struct *tsk ) ...@@ -134,7 +128,7 @@ static inline void save_init_fpu( struct task_struct *tsk )
{ {
asm volatile( "fxsave %0 ; fnclex" asm volatile( "fxsave %0 ; fnclex"
: "=m" (tsk->thread.i387.fxsave)); : "=m" (tsk->thread.i387.fxsave));
clear_tsk_thread_flag(tsk, TIF_USEDFPU); tsk->thread_info->flags &= ~TIF_USEDFPU;
stts(); stts();
} }
......
#ifndef _ASM_IO_H #ifndef _ASM_IO_H
#define _ASM_IO_H #define _ASM_IO_H
#include <linux/config.h>
/* /*
* This file contains the definitions for the x86 IO instructions * This file contains the definitions for the x86 IO instructions
* inb/inw/inl/outb/outw/outl and the "string versions" of the same * inb/inw/inl/outb/outw/outl and the "string versions" of the same
...@@ -135,7 +137,12 @@ extern inline void * phys_to_virt(unsigned long address) ...@@ -135,7 +137,12 @@ extern inline void * phys_to_virt(unsigned long address)
/* /*
* Change "struct page" to physical address. * Change "struct page" to physical address.
*/ */
#ifdef CONFIG_DISCONTIGMEM
#include <asm/mmzone.h>
#define page_to_phys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT)
#else
#define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT)
#endif
extern void * __ioremap(unsigned long offset, unsigned long size, unsigned long flags); extern void * __ioremap(unsigned long offset, unsigned long size, unsigned long flags);
......
#ifndef _ASM_MMSEGMENT_H
#define _ASM_MMSEGMENT_H 1
typedef struct {
unsigned long seg;
} mm_segment_t;
#endif
/* K8 NUMA support */
/* Copyright 2002,2003 by Andi Kleen, SuSE Labs */
/* 2.5 Version losely based on the NUMAQ Code by Pat Gaughen. */
#ifndef _ASM_X86_64_MMZONE_H
#define _ASM_X86_64_MMZONE_H 1
#include <linux/config.h>
#ifdef CONFIG_DISCONTIGMEM
#define VIRTUAL_BUG_ON(x)
#include <asm/numnodes.h>
#include <asm/smp.h>
#define MAXNODE 8
#define NODEMAPSIZE 0xff
/* Simple perfect hash to map physical addresses to node numbers */
extern int memnode_shift;
extern u8 memnodemap[NODEMAPSIZE];
extern int maxnode;
extern struct pglist_data *node_data[];
/* kern_addr_valid below hardcodes the same algorithm*/
static inline __attribute__((pure)) int phys_to_nid(unsigned long addr)
{
int nid;
VIRTUAL_BUG_ON((addr >> memnode_shift) >= NODEMAPSIZE);
nid = memnodemap[addr >> memnode_shift];
VIRTUAL_BUG_ON(nid > maxnode);
return nid;
}
#define kvaddr_to_nid(kaddr) phys_to_nid(__pa(kaddr))
#define NODE_DATA(nid) (node_data[nid])
#define node_mem_map(nid) (NODE_DATA(nid)->node_mem_map)
#define node_mem_map(nid) (NODE_DATA(nid)->node_mem_map)
#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn)
#define node_end_pfn(nid) (NODE_DATA(nid)->node_start_pfn + \
NODE_DATA(nid)->node_size)
#define node_size(nid) (NODE_DATA(nid)->node_size)
#define local_mapnr(kvaddr) \
( (__pa(kvaddr) >> PAGE_SHIFT) - node_start_pfn(kvaddr_to_nid(kvaddr)) )
#define kern_addr_valid(kvaddr) ({ \
int ok = 0; \
unsigned long index = __pa(kvaddr) >> memnode_shift; \
if (index <= NODEMAPSIZE) { \
unsigned nodeid = memnodemap[index]; \
unsigned long pfn = __pa(kvaddr) >> PAGE_SHIFT; \
unsigned long start_pfn = node_start_pfn(nodeid); \
ok = (nodeid != 0xff) && \
(pfn >= start_pfn) && \
(pfn < start_pfn + node_size(nodeid)); \
} \
ok; \
})
/* AK: this currently doesn't deal with invalid addresses. We'll see
if the 2.5 kernel doesn't pass them
(2.4 used to). */
#define pfn_to_page(pfn) ({ \
int nid = phys_to_nid(((unsigned long)(pfn)) << PAGE_SHIFT); \
((pfn) - node_start_pfn(nid)) + node_mem_map(nid); \
})
#define page_to_pfn(page) \
(long)(((page) - page_zone(page)->zone_mem_map) + page_zone(page)->zone_start_pfn)
/* AK: !DISCONTIGMEM just forces it to 1. Can't we too? */
#define pfn_valid(pfn) ((pfn) < num_physpages)
#endif
#endif
...@@ -185,7 +185,6 @@ extern int mp_bus_id_to_pci_bus [MAX_MP_BUSSES]; ...@@ -185,7 +185,6 @@ extern int mp_bus_id_to_pci_bus [MAX_MP_BUSSES];
extern int mp_current_pci_id; extern int mp_current_pci_id;
extern unsigned long mp_lapic_addr; extern unsigned long mp_lapic_addr;
extern int pic_mode; extern int pic_mode;
extern int using_apic_timer;
#ifdef CONFIG_ACPI_BOOT #ifdef CONFIG_ACPI_BOOT
extern void mp_register_lapic (u8 id, u8 enabled); extern void mp_register_lapic (u8 id, u8 enabled);
...@@ -199,5 +198,7 @@ extern void mp_parse_prt (void); ...@@ -199,5 +198,7 @@ extern void mp_parse_prt (void);
#endif /*CONFIG_X86_IO_APIC*/ #endif /*CONFIG_X86_IO_APIC*/
#endif #endif
extern int using_apic_timer;
#endif #endif
...@@ -67,6 +67,61 @@ ...@@ -67,6 +67,61 @@
: "=a" (low), "=d" (high) \ : "=a" (low), "=d" (high) \
: "c" (counter)) : "c" (counter))
extern inline void cpuid(int op, int *eax, int *ebx, int *ecx, int *edx)
{
__asm__("cpuid"
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
: "0" (op));
}
/*
* CPUID functions returning a single datum
*/
extern inline unsigned int cpuid_eax(unsigned int op)
{
unsigned int eax;
__asm__("cpuid"
: "=a" (eax)
: "0" (op)
: "bx", "cx", "dx");
return eax;
}
extern inline unsigned int cpuid_ebx(unsigned int op)
{
unsigned int eax, ebx;
__asm__("cpuid"
: "=a" (eax), "=b" (ebx)
: "0" (op)
: "cx", "dx" );
return ebx;
}
extern inline unsigned int cpuid_ecx(unsigned int op)
{
unsigned int eax, ecx;
__asm__("cpuid"
: "=a" (eax), "=c" (ecx)
: "0" (op)
: "bx", "dx" );
return ecx;
}
extern inline unsigned int cpuid_edx(unsigned int op)
{
unsigned int eax, edx;
__asm__("cpuid"
: "=a" (eax), "=d" (edx)
: "0" (op)
: "bx", "cx");
return edx;
}
#endif #endif
/* AMD/K8 specific MSRs */ /* AMD/K8 specific MSRs */
......
#ifndef _ASM_X8664_NUMA_H
#define _ASM_X8664_NUMA_H 1
#define MAXNODE 8
#define NODEMASK 0xff
struct node {
u64 start,end;
};
#define for_all_nodes(x) for ((x) = 0; (x) <= maxnode; (x)++) \
if ((1UL << (x)) & nodes_present)
#define early_for_all_nodes(n) \
for (n=0; n<MAXNODE;n++) if (nodes[n].start!=nodes[n].end)
extern int compute_hash_shift(struct node *nodes, int numnodes, u64 maxmem);
extern unsigned long nodes_present;
#define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
#endif
#ifndef _ASM_X8664_NUMNODES_H
#define _ASM_X8664_NUMNODES_H 1
#include <linux/config.h>
#ifdef CONFIG_DISCONTIGMEM
#define MAX_NUMNODES 8 /* APIC limit currently */
#else
#define MAX_NUMNODES 1
#endif
#endif
#ifndef _X86_64_PAGE_H #ifndef _X86_64_PAGE_H
#define _X86_64_PAGE_H #define _X86_64_PAGE_H
#include <linux/config.h>
/* PAGE_SHIFT determines the page size */ /* PAGE_SHIFT determines the page size */
#define PAGE_SHIFT 12 #define PAGE_SHIFT 12
#ifdef __ASSEMBLY__ #ifdef __ASSEMBLY__
...@@ -10,7 +12,13 @@ ...@@ -10,7 +12,13 @@
#endif #endif
#define PAGE_MASK (~(PAGE_SIZE-1)) #define PAGE_MASK (~(PAGE_SIZE-1))
#define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & (__PHYSICAL_MASK << PAGE_SHIFT)) #define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & (__PHYSICAL_MASK << PAGE_SHIFT))
#define THREAD_SIZE (2*PAGE_SIZE)
#define THREAD_ORDER 1
#ifdef __ASSEMBLY__
#define THREAD_SIZE (1 << (PAGE_SHIFT + THREAD_ORDER))
#else
#define THREAD_SIZE (1UL << (PAGE_SHIFT + THREAD_ORDER))
#endif
#define CURRENT_MASK (~(THREAD_SIZE-1)) #define CURRENT_MASK (~(THREAD_SIZE-1))
#define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1)) #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
...@@ -58,7 +66,7 @@ typedef struct { unsigned long pgprot; } pgprot_t; ...@@ -58,7 +66,7 @@ typedef struct { unsigned long pgprot; } pgprot_t;
/* to align the pointer to the (next) page boundary */ /* to align the pointer to the (next) page boundary */
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
/* See Documentation/x86_64/mm.txt for a description of the layout. */ /* See Documentation/x86_64/mm.txt for a description of the memory map. */
#define __START_KERNEL 0xffffffff80100000 #define __START_KERNEL 0xffffffff80100000
#define __START_KERNEL_map 0xffffffff80000000 #define __START_KERNEL_map 0xffffffff80000000
#define __PAGE_OFFSET 0x0000010000000000 #define __PAGE_OFFSET 0x0000010000000000
...@@ -100,10 +108,13 @@ extern __inline__ int get_order(unsigned long size) ...@@ -100,10 +108,13 @@ extern __inline__ int get_order(unsigned long size)
__pa(v); }) __pa(v); })
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
#ifndef CONFIG_DISCONTIGMEM
#define pfn_to_page(pfn) (mem_map + (pfn)) #define pfn_to_page(pfn) (mem_map + (pfn))
#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) #define page_to_pfn(page) ((unsigned long)((page) - mem_map))
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
#define pfn_valid(pfn) ((pfn) < max_mapnr) #define pfn_valid(pfn) ((pfn) < max_mapnr)
#endif
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
......
...@@ -55,7 +55,10 @@ asm volatile(op "q %0,%%gs:%c1"::"r" (val),"i"(pda_offset(field)):"memory"); bre ...@@ -55,7 +55,10 @@ asm volatile(op "q %0,%%gs:%c1"::"r" (val),"i"(pda_offset(field)):"memory"); bre
} \ } \
} while (0) } while (0)
/*
* AK: PDA read accesses should be neither volatile nor have an memory clobber.
* Unfortunately removing them causes all hell to break lose currently.
*/
#define pda_from_op(op,field) ({ \ #define pda_from_op(op,field) ({ \
typedef typeof_field(struct x8664_pda, field) T__; T__ ret__; \ typedef typeof_field(struct x8664_pda, field) T__; T__ ret__; \
switch (sizeof_field(struct x8664_pda, field)) { \ switch (sizeof_field(struct x8664_pda, field)) { \
......
...@@ -14,8 +14,7 @@ ...@@ -14,8 +14,7 @@
static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte)
{ {
set_pmd(pmd, __pmd(_PAGE_TABLE | set_pmd(pmd, __pmd(_PAGE_TABLE | (page_to_pfn(pte) << PAGE_SHIFT)));
((u64)(pte - mem_map) << PAGE_SHIFT)));
} }
extern __inline__ pmd_t *get_pmd(void) extern __inline__ pmd_t *get_pmd(void)
...@@ -76,6 +75,6 @@ extern inline void pte_free(struct page *pte) ...@@ -76,6 +75,6 @@ extern inline void pte_free(struct page *pte)
} }
#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte)) #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
#define __pmd_free_tlb(tlb,x) do { } while (0) #define __pmd_free_tlb(tlb,x) pmd_free(x)
#endif /* _X86_64_PGALLOC_H */ #endif /* _X86_64_PGALLOC_H */
...@@ -103,6 +103,8 @@ static inline void set_pml4(pml4_t *dst, pml4_t val) ...@@ -103,6 +103,8 @@ static inline void set_pml4(pml4_t *dst, pml4_t val)
#define ptep_get_and_clear(xp) __pte(xchg(&(xp)->pte, 0)) #define ptep_get_and_clear(xp) __pte(xchg(&(xp)->pte, 0))
#define pte_same(a, b) ((a).pte == (b).pte) #define pte_same(a, b) ((a).pte == (b).pte)
#define PML4_SIZE (1UL << PML4_SHIFT)
#define PML4_MASK (~(PML4_SIZE-1))
#define PMD_SIZE (1UL << PMD_SHIFT) #define PMD_SIZE (1UL << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE-1)) #define PMD_MASK (~(PMD_SIZE-1))
#define PGDIR_SIZE (1UL << PGDIR_SHIFT) #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
...@@ -317,7 +319,8 @@ static inline pgd_t *current_pgd_offset_k(unsigned long address) ...@@ -317,7 +319,8 @@ static inline pgd_t *current_pgd_offset_k(unsigned long address)
/* PMD - Level 2 access */ /* PMD - Level 2 access */
#define pmd_page_kernel(pmd) ((unsigned long) __va(pmd_val(pmd) & PTE_MASK)) #define pmd_page_kernel(pmd) ((unsigned long) __va(pmd_val(pmd) & PTE_MASK))
#define pmd_page(pmd) (mem_map + ((pmd_val(pmd) & PTE_MASK)>>PAGE_SHIFT)) #define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
#define __pmd_offset(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1)) #define __pmd_offset(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
#define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \ #define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \
__pmd_offset(address)) __pmd_offset(address))
...@@ -372,7 +375,9 @@ typedef pte_t *pte_addr_t; ...@@ -372,7 +375,9 @@ typedef pte_t *pte_addr_t;
#endif /* !__ASSEMBLY__ */ #endif /* !__ASSEMBLY__ */
#ifndef CONFIG_DISCONTIGMEM
#define kern_addr_valid(addr) (1) #define kern_addr_valid(addr) (1)
#endif
#define io_remap_page_range remap_page_range #define io_remap_page_range remap_page_range
......
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include <asm/msr.h> #include <asm/msr.h>
#include <asm/current.h> #include <asm/current.h>
#include <asm/system.h> #include <asm/system.h>
#include <asm/mmsegment.h>
#define TF_MASK 0x00000100 #define TF_MASK 0x00000100
#define IF_MASK 0x00000200 #define IF_MASK 0x00000200
...@@ -109,64 +110,6 @@ extern void dodgy_tsc(void); ...@@ -109,64 +110,6 @@ extern void dodgy_tsc(void);
#define X86_EFLAGS_VIP 0x00100000 /* Virtual Interrupt Pending */ #define X86_EFLAGS_VIP 0x00100000 /* Virtual Interrupt Pending */
#define X86_EFLAGS_ID 0x00200000 /* CPUID detection flag */ #define X86_EFLAGS_ID 0x00200000 /* CPUID detection flag */
/*
* Generic CPUID function
* FIXME: This really belongs to msr.h
*/
extern inline void cpuid(int op, int *eax, int *ebx, int *ecx, int *edx)
{
__asm__("cpuid"
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
: "0" (op));
}
/*
* CPUID functions returning a single datum
*/
extern inline unsigned int cpuid_eax(unsigned int op)
{
unsigned int eax;
__asm__("cpuid"
: "=a" (eax)
: "0" (op)
: "bx", "cx", "dx");
return eax;
}
extern inline unsigned int cpuid_ebx(unsigned int op)
{
unsigned int eax, ebx;
__asm__("cpuid"
: "=a" (eax), "=b" (ebx)
: "0" (op)
: "cx", "dx" );
return ebx;
}
extern inline unsigned int cpuid_ecx(unsigned int op)
{
unsigned int eax, ecx;
__asm__("cpuid"
: "=a" (eax), "=c" (ecx)
: "0" (op)
: "bx", "dx" );
return ecx;
}
extern inline unsigned int cpuid_edx(unsigned int op)
{
unsigned int eax, edx;
__asm__("cpuid"
: "=a" (eax), "=d" (edx)
: "0" (op)
: "bx", "cx");
return edx;
}
/* /*
* Intel CPU features in CR4 * Intel CPU features in CR4
*/ */
...@@ -210,36 +153,6 @@ static inline void clear_in_cr4 (unsigned long mask) ...@@ -210,36 +153,6 @@ static inline void clear_in_cr4 (unsigned long mask)
:"ax"); :"ax");
} }
#if 0
/*
* Cyrix CPU configuration register indexes
*/
#define CX86_CCR0 0xc0
#define CX86_CCR1 0xc1
#define CX86_CCR2 0xc2
#define CX86_CCR3 0xc3
#define CX86_CCR4 0xe8
#define CX86_CCR5 0xe9
#define CX86_CCR6 0xea
#define CX86_CCR7 0xeb
#define CX86_DIR0 0xfe
#define CX86_DIR1 0xff
#define CX86_ARR_BASE 0xc4
#define CX86_RCR_BASE 0xdc
/*
* Cyrix CPU indexed register access macros
*/
#define getCx86(reg) ({ outb((reg), 0x22); inb(0x23); })
#define setCx86(reg, data) do { \
outb((reg), 0x22); \
outb((data), 0x23); \
} while (0)
#endif
/* /*
* Bus types * Bus types
*/ */
...@@ -286,10 +199,6 @@ union i387_union { ...@@ -286,10 +199,6 @@ union i387_union {
struct i387_fxsave_struct fxsave; struct i387_fxsave_struct fxsave;
}; };
typedef struct {
unsigned long seg;
} mm_segment_t;
struct tss_struct { struct tss_struct {
u32 reserved1; u32 reserved1;
u64 rsp0; u64 rsp0;
...@@ -302,7 +211,7 @@ struct tss_struct { ...@@ -302,7 +211,7 @@ struct tss_struct {
u16 reserved5; u16 reserved5;
u16 io_map_base; u16 io_map_base;
u32 io_bitmap[IO_BITMAP_SIZE]; u32 io_bitmap[IO_BITMAP_SIZE];
} __attribute__((packed)); } __attribute__((packed)) ____cacheline_aligned;
struct thread_struct { struct thread_struct {
unsigned long rsp0; unsigned long rsp0;
...@@ -336,6 +245,7 @@ struct thread_struct { ...@@ -336,6 +245,7 @@ struct thread_struct {
#define NMI_STACK 3 #define NMI_STACK 3
#define N_EXCEPTION_STACKS 3 /* hw limit: 7 */ #define N_EXCEPTION_STACKS 3 /* hw limit: 7 */
#define EXCEPTION_STKSZ 1024 #define EXCEPTION_STKSZ 1024
#define EXCEPTION_STK_ORDER 0
#define start_thread(regs,new_rip,new_rsp) do { \ #define start_thread(regs,new_rip,new_rsp) do { \
asm volatile("movl %0,%%fs; movl %0,%%es; movl %0,%%ds": :"r" (0)); \ asm volatile("movl %0,%%fs; movl %0,%%es; movl %0,%%ds": :"r" (0)); \
...@@ -378,6 +288,13 @@ extern inline void rep_nop(void) ...@@ -378,6 +288,13 @@ extern inline void rep_nop(void)
__asm__ __volatile__("rep;nop": : :"memory"); __asm__ __volatile__("rep;nop": : :"memory");
} }
/* Stop speculative execution */
extern inline void sync_core(void)
{
int tmp;
asm volatile("cpuid" : "=a" (tmp) : "0" (1) : "ebx","ecx","edx","memory");
}
#define cpu_has_fpu 1 #define cpu_has_fpu 1
#define ARCH_HAS_PREFETCH #define ARCH_HAS_PREFETCH
...@@ -389,7 +306,6 @@ extern inline void rep_nop(void) ...@@ -389,7 +306,6 @@ extern inline void rep_nop(void)
#define spin_lock_prefetch(x) prefetchw(x) #define spin_lock_prefetch(x) prefetchw(x)
#define cpu_relax() rep_nop() #define cpu_relax() rep_nop()
/* /*
* NSC/Cyrix CPU configuration register indexes * NSC/Cyrix CPU configuration register indexes
*/ */
...@@ -417,4 +333,11 @@ extern inline void rep_nop(void) ...@@ -417,4 +333,11 @@ extern inline void rep_nop(void)
outb((data), 0x23); \ outb((data), 0x23); \
} while (0) } while (0)
#define stack_current() \
({ \
struct thread_info *ti; \
asm("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK)); \
ti->task; \
})
#endif /* __ASM_X86_64_PROCESSOR_H */ #endif /* __ASM_X86_64_PROCESSOR_H */
...@@ -25,6 +25,8 @@ extern void iommu_hole_init(void); ...@@ -25,6 +25,8 @@ extern void iommu_hole_init(void);
extern void do_softirq_thunk(void); extern void do_softirq_thunk(void);
extern int numa_setup(char *opt);
extern int setup_early_printk(char *); extern int setup_early_printk(char *);
extern void early_printk(const char *fmt, ...) __attribute__((format(printf,1,2))); extern void early_printk(const char *fmt, ...) __attribute__((format(printf,1,2)));
...@@ -36,18 +38,27 @@ extern unsigned long numa_free_all_bootmem(void); ...@@ -36,18 +38,27 @@ extern unsigned long numa_free_all_bootmem(void);
extern void reserve_bootmem_generic(unsigned long phys, unsigned len); extern void reserve_bootmem_generic(unsigned long phys, unsigned len);
extern void free_bootmem_generic(unsigned long phys, unsigned len); extern void free_bootmem_generic(unsigned long phys, unsigned len);
extern unsigned long start_pfn, end_pfn, end_pfn_map; extern unsigned long end_pfn_map;
extern void show_stack(unsigned long * rsp); extern void show_stack(unsigned long * rsp);
extern void exception_table_check(void); extern void exception_table_check(void);
extern int acpi_boot_init(char *); extern void acpi_reserve_bootmem(void);
extern void swap_low_mappings(void);
extern int map_syscall32(struct mm_struct *mm, unsigned long address); extern int map_syscall32(struct mm_struct *mm, unsigned long address);
extern char *syscall32_page; extern char *syscall32_page;
void setup_node_bootmem(int nodeid, unsigned long start, unsigned long end);
extern unsigned long max_mapnr;
extern unsigned long end_pfn;
extern unsigned long table_start, table_end;
struct thread_struct; struct thread_struct;
struct user_desc;
int do_set_thread_area(struct thread_struct *t, struct user_desc *u_info); int do_set_thread_area(struct thread_struct *t, struct user_desc *u_info);
int do_get_thread_area(struct thread_struct *t, struct user_desc *u_info); int do_get_thread_area(struct thread_struct *t, struct user_desc *u_info);
......
...@@ -19,13 +19,15 @@ ...@@ -19,13 +19,15 @@
#define __USER_DS 0x2b /* 5*8+3 */ #define __USER_DS 0x2b /* 5*8+3 */
#define __USER_CS 0x33 /* 6*8+3 */ #define __USER_CS 0x33 /* 6*8+3 */
#define __USER32_DS __USER_DS #define __USER32_DS __USER_DS
#define __KERNEL16_CS (GDT_ENTRY_KERNELCS16 * 8)
#define GDT_ENTRY_TLS 1 #define GDT_ENTRY_TLS 1
#define GDT_ENTRY_TSS 8 /* needs two entries */ #define GDT_ENTRY_TSS 8 /* needs two entries */
#define GDT_ENTRY_LDT 10 #define GDT_ENTRY_LDT 10
#define GDT_ENTRY_TLS_MIN 11 #define GDT_ENTRY_TLS_MIN 11
#define GDT_ENTRY_TLS_MAX 13 #define GDT_ENTRY_TLS_MAX 13
#define GDT_ENTRY_LONGBASE 14 /* 14 free */
#define GDT_ENTRY_KERNELCS16 15
#define GDT_ENTRY_TLS_ENTRIES 3 #define GDT_ENTRY_TLS_ENTRIES 3
......
...@@ -44,7 +44,9 @@ extern void smp_send_reschedule(int cpu); ...@@ -44,7 +44,9 @@ extern void smp_send_reschedule(int cpu);
extern void smp_send_reschedule_all(void); extern void smp_send_reschedule_all(void);
extern void smp_invalidate_rcv(void); /* Process an NMI */ extern void smp_invalidate_rcv(void); /* Process an NMI */
extern void (*mtrr_hook) (void); extern void (*mtrr_hook) (void);
extern void zap_low_mappings (void); extern void zap_low_mappings(void);
#define SMP_TRAMPOLINE_BASE 0x6000
/* /*
* On x86 all CPUs are mapped 1:1 to the APIC space. * On x86 all CPUs are mapped 1:1 to the APIC space.
...@@ -55,38 +57,26 @@ extern void zap_low_mappings (void); ...@@ -55,38 +57,26 @@ extern void zap_low_mappings (void);
extern volatile unsigned long cpu_callout_map; extern volatile unsigned long cpu_callout_map;
#define cpu_possible(cpu) (cpu_callout_map & (1<<(cpu))) #define cpu_possible(cpu) (cpu_callout_map & (1<<(cpu)))
#define cpu_online(cpu) (cpu_online_map & (1<<(cpu)))
extern inline int cpu_logical_map(int cpu) #define for_each_cpu(cpu, mask) \
{ for(mask = cpu_online_map; \
return cpu; cpu = __ffs(mask), mask != 0; \
} mask &= ~(1UL<<cpu))
extern inline int cpu_number_map(int cpu)
{
return cpu;
}
extern inline unsigned int num_online_cpus(void) extern inline int any_online_cpu(unsigned int mask)
{ {
return hweight32(cpu_online_map); if (mask & cpu_online_map)
} return __ffs(mask & cpu_online_map);
extern inline int find_next_cpu(unsigned cpu)
{
unsigned long left = cpu_online_map >> (cpu+1);
if (!left)
return -1; return -1;
return ffz(~left) + cpu;
} }
extern inline int find_first_cpu(void) extern inline unsigned int num_online_cpus(void)
{ {
return ffz(~cpu_online_map); return hweight32(cpu_online_map);
} }
/* RED-PEN different from i386 */
#define for_each_cpu(i) \
for((i) = find_first_cpu(); (i)>=0; (i)=find_next_cpu(i))
static inline int num_booting_cpus(void) static inline int num_booting_cpus(void)
{ {
return hweight32(cpu_callout_map); return hweight32(cpu_callout_map);
...@@ -94,28 +84,25 @@ static inline int num_booting_cpus(void) ...@@ -94,28 +84,25 @@ static inline int num_booting_cpus(void)
extern volatile unsigned long cpu_callout_map; extern volatile unsigned long cpu_callout_map;
/*
* Some lowlevel functions might want to know about
* the real APIC ID <-> CPU # mapping.
*/
extern volatile int x86_apicid_to_cpu[NR_CPUS];
extern volatile int x86_cpu_to_apicid[NR_CPUS];
/*
* This function is needed by all SMP systems. It must _always_ be valid
* from the initial startup. We map APIC_BASE very early in page_setup(),
* so this is correct in the x86 case.
*/
#define smp_processor_id() read_pda(cpunumber) #define smp_processor_id() read_pda(cpunumber)
extern __inline int hard_smp_processor_id(void) extern __inline int hard_smp_processor_id(void)
{ {
/* we don't want to mark this access volatile - bad code generation */ /* we don't want to mark this access volatile - bad code generation */
return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID)); return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID));
} }
extern int disable_apic;
extern int slow_smp_processor_id(void);
extern inline int safe_smp_processor_id(void)
{
if (disable_apic)
return slow_smp_processor_id();
else
return hard_smp_processor_id();
}
#define cpu_online(cpu) (cpu_online_map & (1<<(cpu))) #define cpu_online(cpu) (cpu_online_map & (1<<(cpu)))
#endif /* !ASSEMBLY */ #endif /* !ASSEMBLY */
...@@ -128,6 +115,7 @@ extern __inline int hard_smp_processor_id(void) ...@@ -128,6 +115,7 @@ extern __inline int hard_smp_processor_id(void)
#ifndef CONFIG_SMP #ifndef CONFIG_SMP
#define stack_smp_processor_id() 0 #define stack_smp_processor_id() 0
#define safe_smp_processor_id() 0
#define for_each_cpu(x) (x)=0; #define for_each_cpu(x) (x)=0;
#define cpu_logical_map(x) (x) #define cpu_logical_map(x) (x)
#else #else
...@@ -135,7 +123,7 @@ extern __inline int hard_smp_processor_id(void) ...@@ -135,7 +123,7 @@ extern __inline int hard_smp_processor_id(void)
#define stack_smp_processor_id() \ #define stack_smp_processor_id() \
({ \ ({ \
struct thread_info *ti; \ struct thread_info *ti; \
__asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (~8191UL)); \ __asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK)); \
ti->cpu; \ ti->cpu; \
}) })
#endif #endif
......
...@@ -15,7 +15,7 @@ extern int printk(const char * fmt, ...) ...@@ -15,7 +15,7 @@ extern int printk(const char * fmt, ...)
typedef struct { typedef struct {
volatile unsigned int lock; volatile unsigned int lock;
#if CONFIG_DEBUG_SPINLOCK #ifdef CONFIG_DEBUG_SPINLOCK
unsigned magic; unsigned magic;
#endif #endif
} spinlock_t; } spinlock_t;
...@@ -56,13 +56,56 @@ typedef struct { ...@@ -56,13 +56,56 @@ typedef struct {
/* /*
* This works. Despite all the confusion. * This works. Despite all the confusion.
* (except on PPro SMP or if we are using OOSTORE)
* (PPro errata 66, 92)
*/ */
#if !defined(CONFIG_X86_OOSTORE) && !defined(CONFIG_X86_PPRO_FENCE)
#define spin_unlock_string \ #define spin_unlock_string \
"movb $1,%0" "movb $1,%0" \
:"=m" (lock->lock) : : "memory"
static inline void _raw_spin_unlock(spinlock_t *lock)
{
#ifdef CONFIG_DEBUG_SPINLOCK
if (lock->magic != SPINLOCK_MAGIC)
BUG();
if (!spin_is_locked(lock))
BUG();
#endif
__asm__ __volatile__(
spin_unlock_string
);
}
#else
#define spin_unlock_string \
"xchgb %b0, %1" \
:"=q" (oldval), "=m" (lock->lock) \
:"0" (oldval) : "memory"
static inline void _raw_spin_unlock(spinlock_t *lock)
{
char oldval = 1;
#ifdef CONFIG_DEBUG_SPINLOCK
if (lock->magic != SPINLOCK_MAGIC)
BUG();
if (!spin_is_locked(lock))
BUG();
#endif
__asm__ __volatile__(
spin_unlock_string
);
}
#endif
static inline int _raw_spin_trylock(spinlock_t *lock) static inline int _raw_spin_trylock(spinlock_t *lock)
{ {
signed char oldval; char oldval;
__asm__ __volatile__( __asm__ __volatile__(
"xchgb %b0,%1" "xchgb %b0,%1"
:"=q" (oldval), "=m" (lock->lock) :"=q" (oldval), "=m" (lock->lock)
...@@ -85,18 +128,6 @@ printk("eip: %p\n", &&here); ...@@ -85,18 +128,6 @@ printk("eip: %p\n", &&here);
:"=m" (lock->lock) : : "memory"); :"=m" (lock->lock) : : "memory");
} }
static inline void _raw_spin_unlock(spinlock_t *lock)
{
#ifdef CONFIG_DEBUG_SPINLOCK
if (lock->magic != SPINLOCK_MAGIC)
BUG();
if (!spin_is_locked(lock))
BUG();
#endif
__asm__ __volatile__(
spin_unlock_string
:"=m" (lock->lock) : : "memory");
}
/* /*
* Read-write spinlocks, allowing multiple readers * Read-write spinlocks, allowing multiple readers
...@@ -127,6 +158,8 @@ typedef struct { ...@@ -127,6 +158,8 @@ typedef struct {
#define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0)
#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
/* /*
* On x86, we implement read-write locks as a 32-bit counter * On x86, we implement read-write locks as a 32-bit counter
* with the high bit (sign) being the "contended" bit. * with the high bit (sign) being the "contended" bit.
...@@ -136,9 +169,9 @@ typedef struct { ...@@ -136,9 +169,9 @@ typedef struct {
* Changed to use the same technique as rw semaphores. See * Changed to use the same technique as rw semaphores. See
* semaphore.h for details. -ben * semaphore.h for details. -ben
*/ */
/* the spinlock helpers are in arch/x86_64/kernel/semaphore.S */ /* the spinlock helpers are in arch/i386/kernel/semaphore.c */
extern inline void _raw_read_lock(rwlock_t *rw) static inline void _raw_read_lock(rwlock_t *rw)
{ {
#ifdef CONFIG_DEBUG_SPINLOCK #ifdef CONFIG_DEBUG_SPINLOCK
if (rw->magic != RWLOCK_MAGIC) if (rw->magic != RWLOCK_MAGIC)
...@@ -168,6 +201,4 @@ static inline int _raw_write_trylock(rwlock_t *lock) ...@@ -168,6 +201,4 @@ static inline int _raw_write_trylock(rwlock_t *lock)
return 0; return 0;
} }
#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
#endif /* __ASM_SPINLOCK_H */ #endif /* __ASM_SPINLOCK_H */
...@@ -83,7 +83,7 @@ extern void load_gs_index(unsigned); ...@@ -83,7 +83,7 @@ extern void load_gs_index(unsigned);
#define loadsegment(seg,value) \ #define loadsegment(seg,value) \
asm volatile("\n" \ asm volatile("\n" \
"1:\t" \ "1:\t" \
"movl %0,%%" #seg "\n" \ "movl %k0,%%" #seg "\n" \
"2:\n" \ "2:\n" \
".section .fixup,\"ax\"\n" \ ".section .fixup,\"ax\"\n" \
"3:\t" \ "3:\t" \
...@@ -94,7 +94,7 @@ extern void load_gs_index(unsigned); ...@@ -94,7 +94,7 @@ extern void load_gs_index(unsigned);
".align 8\n\t" \ ".align 8\n\t" \
".quad 1b,3b\n" \ ".quad 1b,3b\n" \
".previous" \ ".previous" \
: :"r" ((int)(value))) : :"r" (value))
#define set_debug(value,register) \ #define set_debug(value,register) \
__asm__("movq %0,%%db" #register \ __asm__("movq %0,%%db" #register \
...@@ -119,6 +119,13 @@ static inline void write_cr0(unsigned long val) ...@@ -119,6 +119,13 @@ static inline void write_cr0(unsigned long val)
asm volatile("movq %0,%%cr0" :: "r" (val)); asm volatile("movq %0,%%cr0" :: "r" (val));
} }
static inline unsigned long read_cr3(void)
{
unsigned long cr3;
asm("movq %%cr3,%0" : "=r" (cr3));
return cr3;
}
static inline unsigned long read_cr4(void) static inline unsigned long read_cr4(void)
{ {
unsigned long cr4; unsigned long cr4;
......
...@@ -9,11 +9,8 @@ ...@@ -9,11 +9,8 @@
#ifdef __KERNEL__ #ifdef __KERNEL__
#ifndef __ASSEMBLY__ #include <asm/page.h>
#include <asm/processor.h> #include <asm/types.h>
#include <linux/config.h>
#include <asm/pda.h>
#endif
/* /*
* low level task data that entry.S needs immediate access to * low level task data that entry.S needs immediate access to
...@@ -21,6 +18,10 @@ ...@@ -21,6 +18,10 @@
* - this struct shares the supervisor stack pages * - this struct shares the supervisor stack pages
*/ */
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
struct task_struct;
struct exec_domain;
#include <asm/mmsegment.h>
struct thread_info { struct thread_info {
struct task_struct *task; /* main task structure */ struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */ struct exec_domain *exec_domain; /* execution domain */
...@@ -31,7 +32,6 @@ struct thread_info { ...@@ -31,7 +32,6 @@ struct thread_info {
mm_segment_t addr_limit; mm_segment_t addr_limit;
struct restart_block restart_block; struct restart_block restart_block;
}; };
#endif #endif
/* /*
...@@ -55,27 +55,17 @@ struct thread_info { ...@@ -55,27 +55,17 @@ struct thread_info {
#define init_thread_info (init_thread_union.thread_info) #define init_thread_info (init_thread_union.thread_info)
#define init_stack (init_thread_union.stack) #define init_stack (init_thread_union.stack)
/* how to get the thread information struct from C */
#define THREAD_SIZE (2*PAGE_SIZE)
static inline struct thread_info *current_thread_info(void) static inline struct thread_info *current_thread_info(void)
{ {
struct thread_info *ti; struct thread_info *ti;
ti = (void *)read_pda(kernelstack) + PDA_STACKOFFSET - THREAD_SIZE; asm("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK));
return ti;
}
static inline struct thread_info *stack_thread_info(void)
{
struct thread_info *ti;
__asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (~8191UL));
return ti; return ti;
} }
/* thread information allocation */ /* thread information allocation */
#define alloc_thread_info() ((struct thread_info *) __get_free_pages(GFP_KERNEL,1)) #define alloc_thread_info() \
#define free_thread_info(ti) free_pages((unsigned long) (ti), 1) ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
#define get_thread_info(ti) get_task_struct((ti)->task) #define get_thread_info(ti) get_task_struct((ti)->task)
#define put_thread_info(ti) put_task_struct((ti)->task) #define put_thread_info(ti) put_task_struct((ti)->task)
...@@ -84,7 +74,7 @@ static inline struct thread_info *stack_thread_info(void) ...@@ -84,7 +74,7 @@ static inline struct thread_info *stack_thread_info(void)
/* how to get the thread information struct from ASM */ /* how to get the thread information struct from ASM */
/* only works on the process stack. otherwise get it via the PDA. */ /* only works on the process stack. otherwise get it via the PDA. */
#define GET_THREAD_INFO(reg) \ #define GET_THREAD_INFO(reg) \
movq $-8192, reg; \ movq $CURRENT_MASK, reg; \
andq %rsp, reg andq %rsp, reg
#endif #endif
......
/* /*
* linux/include/asm-x8664/timex.h * linux/include/asm-x86_64/timex.h
* *
* x8664 architecture timex specifications * x86-64 architecture timex specifications
*/ */
#ifndef _ASMx8664_TIMEX_H #ifndef _ASMx8664_TIMEX_H
#define _ASMx8664_TIMEX_H #define _ASMx8664_TIMEX_H
...@@ -16,20 +16,6 @@ ...@@ -16,20 +16,6 @@
(1000000/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \ (1000000/CLOCK_TICK_FACTOR) / (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR)) \
<< (SHIFT_SCALE-SHIFT_HZ)) / HZ) << (SHIFT_SCALE-SHIFT_HZ)) / HZ)
/*
* Standard way to access the cycle counter on i586+ CPUs.
* Currently only used on SMP.
*
* If you really have a SMP machine with i486 chips or older,
* compile for that, and this will just always return zero.
* That's ok, it just means that the nicer scheduling heuristics
* won't work for you.
*
* We only use the low 32 bits, and we'd simply better make sure
* that we reschedule before that wraps. Scheduling at least every
* four billion cycles just basically sounds like a good idea,
* regardless of how fast the machine is.
*/
typedef unsigned long long cycles_t; typedef unsigned long long cycles_t;
extern cycles_t cacheflush_time; extern cycles_t cacheflush_time;
......
#ifndef _ASM_X86_64_TOPOLOGY_H #ifndef _ASM_X86_64_TOPOLOGY_H
#define _ASM_X86_64_TOPOLOGY_H #define _ASM_X86_64_TOPOLOGY_H
#include <linux/config.h>
#ifdef CONFIG_DISCONTIGMEM
/* Map the K8 CPU local memory controllers to a simple 1:1 CPU:NODE topology */
extern int fake_node;
extern unsigned long cpu_online_map;
#define cpu_to_node(cpu) (fake_node ? 0 : (cpu))
#define memblk_to_node(memblk) (fake_node ? 0 : (memblk))
#define parent_node(node) (node)
#define node_to_first_cpu(node) (fake_node ? 0 : (node))
#define node_to_cpu_mask(node) (fake_node ? cpu_online_map : (1UL << (node)))
#define node_to_memblk(node) (node)
#define NODE_BALANCE_RATE 30 /* CHECKME */
#endif
#include <asm-generic/topology.h> #include <asm-generic/topology.h>
#endif /* _ASM_X86_64_TOPOLOGY_H */ #endif
...@@ -500,8 +500,10 @@ __SYSCALL(__NR_set_tid_address, sys_set_tid_address) ...@@ -500,8 +500,10 @@ __SYSCALL(__NR_set_tid_address, sys_set_tid_address)
__SYSCALL(__NR_restart_syscall, sys_restart_syscall) __SYSCALL(__NR_restart_syscall, sys_restart_syscall)
#define __NR_semtimedop 220 #define __NR_semtimedop 220
__SYSCALL(__NR_semtimedop, sys_semtimedop) __SYSCALL(__NR_semtimedop, sys_semtimedop)
#define __NR_fadvise64 221
__SYSCALL(__NR_fadvise64, sys_fadvise64)
#define __NR_syscall_max __NR_semtimedop #define __NR_syscall_max __NR_fadvise64
#ifndef __NO_STUBS #ifndef __NO_STUBS
/* user-visible error numbers are in the range -1 - -4095 */ /* user-visible error numbers are in the range -1 - -4095 */
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
#define _ASM_X86_64_VSYSCALL_H_ #define _ASM_X86_64_VSYSCALL_H_
#include <linux/time.h> #include <linux/time.h>
#include <linux/seqlock.h>
enum vsyscall_num { enum vsyscall_num {
__NR_vgettimeofday, __NR_vgettimeofday,
...@@ -19,8 +20,10 @@ enum vsyscall_num { ...@@ -19,8 +20,10 @@ enum vsyscall_num {
#define __section_wall_jiffies __attribute__ ((unused, __section__ (".wall_jiffies"), aligned(16))) #define __section_wall_jiffies __attribute__ ((unused, __section__ (".wall_jiffies"), aligned(16)))
#define __section_jiffies __attribute__ ((unused, __section__ (".jiffies"), aligned(16))) #define __section_jiffies __attribute__ ((unused, __section__ (".jiffies"), aligned(16)))
#define __section_sys_tz __attribute__ ((unused, __section__ (".sys_tz"), aligned(16))) #define __section_sys_tz __attribute__ ((unused, __section__ (".sys_tz"), aligned(16)))
#define __section_sysctl_vsyscall __attribute__ ((unused, __section__ (".sysctl_vsyscall"), aligned(16)))
#define __section_xtime __attribute__ ((unused, __section__ (".xtime"), aligned(16))) #define __section_xtime __attribute__ ((unused, __section__ (".xtime"), aligned(16)))
#define __section_vxtime_sequence __attribute__ ((unused, __section__ (".vxtime_sequence"), aligned(16))) #define __section_xtime_lock __attribute__ ((unused, __section__ (".xtime_lock"), aligned(L1_CACHE_BYTES)))
struct hpet_data { struct hpet_data {
long address; /* base address */ long address; /* base address */
...@@ -36,21 +39,21 @@ struct hpet_data { ...@@ -36,21 +39,21 @@ struct hpet_data {
#define hpet_writel(d,a) writel(d, fix_to_virt(FIX_HPET_BASE) + a) #define hpet_writel(d,a) writel(d, fix_to_virt(FIX_HPET_BASE) + a)
/* vsyscall space (readonly) */ /* vsyscall space (readonly) */
extern long __vxtime_sequence[2];
extern struct hpet_data __hpet; extern struct hpet_data __hpet;
extern struct timespec __xtime; extern struct timespec __xtime;
extern volatile unsigned long __jiffies; extern volatile unsigned long __jiffies;
extern unsigned long __wall_jiffies; extern unsigned long __wall_jiffies;
extern struct timezone __sys_tz; extern struct timezone __sys_tz;
extern seqlock_t __xtime_lock;
/* kernel space (writeable) */ /* kernel space (writeable) */
extern long vxtime_sequence[2];
extern struct hpet_data hpet; extern struct hpet_data hpet;
extern unsigned long wall_jiffies; extern unsigned long wall_jiffies;
extern struct timezone sys_tz; extern struct timezone sys_tz;
extern int sysctl_vsyscall;
extern seqlock_t xtime_lock;
#define vxtime_lock() do { vxtime_sequence[0]++; wmb(); } while(0) #define ARCH_HAVE_XTIME_LOCK 1
#define vxtime_unlock() do { wmb(); vxtime_sequence[1]++; } while (0)
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment