Commits · ffb6017563aa15f9a8cff9a30b861d42c2695894 · nexedi / linux

13 Feb, 2007 40 commits

[PATCH] x86-64: x86_64 - Fix FS/GS registers for VT execution · ffb60175

Zachary Amsden authored Feb 13, 2007

Initialize FS and GS to __KERNEL_DS as well. The actual value of them is not
important, but it is important to reload them in protected mode. At this time,
they still retain the real mode values from initial boot. VT disallows
execution of code under such conditions, which means hardware virtualization
can not be used to boot the kernel on Intel platforms, making the boot time
painfully slow.

This requires moving the GS load before the load of GS_BASE, so just move
all the segments loads there to keep them together in the code.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>

ffb60175

[PATCH] x86-64: Unexport __supported_pte_mask · 9a11ff68

Andi Kleen authored Feb 13, 2007

The symbol is needed to manipulate page tables, and modules shouldn't
do that.

Leftover from 2.4, but no in tree module should need it now.
Signed-off-by: Andi Kleen <ak@suse.de>

9a11ff68

[PATCH] x86-64: Check return value of putreg in PTRACE_SETREGS · f49481bc

Andi Kleen authored Feb 13, 2007

This means if an illegal value is set for the segment registers there
ptrace will error out now with an errno instead of silently ignoring
it.
Signed-off-by: Andi Kleen <ak@suse.de>

f49481bc

[PATCH] x86-64: - Ignore long SMI interrupts in clock calibration code - update 1 · 2f7a2a79

Jack Steiner authored Feb 13, 2007

Add failsafe mechanism to HPET/TSC clock calibration.
Signed-off-by: Jack Steiner <steiner@sgi.com>

Updated to include failsafe mechanism & additional community feedback.
Patch built on latest 2.6.20-rc4-mm1 tree.
Signed-off-by: Andi Kleen <ak@suse.de>

2f7a2a79

[PATCH] i386: fix size_or_mask and size_and_mask · 6c5806ca

Andreas Herrmann authored Feb 13, 2007

mtrr: fix size_or_mask and size_and_mask

This fixes two bugs in /proc/mtrr interface:
o If physical address size crosses the 44 bit boundary
  size_or_mask is evaluated wrong.
o size_and_mask limits width of physical base
  address for an MTRR to be less than 44 bits.

TBD: later patch had one more change, but I think that was bogus.
TBD: need to double check
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>

6c5806ca

[PATCH] i386: Convert /proc/apm to seqfile · 016d6f35

Alexey Dobriyan authored Feb 13, 2007

Byte-to-byte identical /proc/apm here.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>

016d6f35

[PATCH] x86-64: Fix preprocessor condition · b0957f1a

Josef 'Jeff' Sipek authored Feb 13, 2007

Old code was legal standard C, but apparently not sparse-C.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andi Kleen <ak@suse.de>

b0957f1a

[PATCH] i386: use smp_call_function_single() · ad4e680f

Alexey Dobriyan authored Feb 13, 2007

It will execure cpuid only on the cpu we need.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>

ad4e680f

[PATCH] i386: use smp_call_function_single() · d958f143

Alexey Dobriyan authored Feb 13, 2007

It will execute rdmsr and wrmsr only on the cpu we need.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>

d958f143

[PATCH] x86-64: Kconfig typos · edf8dd36

Nicolas Kaiser authored Feb 13, 2007

Some typos in Kconfig.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Andi Kleen <ak@suse.de>

edf8dd36

[PATCH] i386: Small cleanup to TLB flush code · 8c40ad02

Andi Kleen authored Feb 13, 2007

- Remove outdated comment
- Use cpu_relax() in a busy loop
Signed-off-by: Andi Kleen <ak@suse.de>

8c40ad02

[PATCH] x86-64: remove get_pmd() · 930f8b8b

Jan Beulich authored Feb 13, 2007

Function is dead.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>

930f8b8b

[PATCH] x86-64: Allow to run a program when a machine check event is detected · a98f0dd3

Andi Kleen authored Feb 13, 2007

When a machine check event is detected (including a AMD RevF threshold
overflow event) allow to run a "trigger" program. This allows user space
to react to such events sooner.

The trigger is configured using a new trigger entry in the
machinecheck sysfs interface. It is currently shared between
all CPUs.

I also fixed the AMD threshold handler to run the machine
check polling code immediately to actually log any events
that might have caused the threshold interrupt.

Also added some documentation for the mce sysfs interface.
Signed-off-by: Andi Kleen <ak@suse.de>

a98f0dd3

[PATCH] x86-64: Tighten mce_amd driver MSR reads · 24ce0e96

Jan Beulich authored Feb 13, 2007

while debugging an unrelated problem in Xen, I noticed odd reads from
non-existent MSRs. Having now found time to look why these happen, I
came up with below patch, which
- prevents accessing MCi_MISCj with j > 0 when the block pointer in
MCi_MISC0 is zero
- accesses only contiguous MCi_MISCj until a non-implemented one is
found
- doesn't touch unimplemented blocks in mce_threshold_interrupt at all
- gives names to two bits previously derived from MASK_VALID_HI (it
took me some time to understand the code without this)

The first three items, besides being apparently closer to the spec, should
namely help cutting down on the time mce_threshold_interrupt() takes.
Signed-off-by: Andi Kleen <ak@suse.de>

24ce0e96

[PATCH] x86: simplify notify_page_fault() · 9b355897

Jan Beulich authored Feb 13, 2007

Remove all parameters from this function that aren't really variable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>

9b355897

[PATCH] x86-64: list x86_64 quilt tree · 6a051565

Randy Dunlap authored Feb 13, 2007

List x86_64 quilt tree in MAINTAINERS.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andi Kleen <ak@suse.de>

6a051565

[PATCH] x86-64: cleanup Doc/x86_64/ files · 57d30772

Randy Dunlap authored Feb 13, 2007

Fix typos.
Lots of whitespace changes for readability and consistency.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>

57d30772

[PATCH] i386: Handle 32 bit PerfMon Counter writes cleanly in oprofile · 44264261

Venkatesh Pallipadi authored Feb 13, 2007

Handle these 32 bit perfmon counter MSR writes cleanly in oprofile.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>

44264261

[PATCH] i386: Handle 32 bit PerfMon Counter writes cleanly in i386 nmi_watchdog · 90ce4bc4

Venkatesh Pallipadi authored Feb 13, 2007

Change i386 nmi handler to handle 32 bit perfmon counter MSR writes cleanly.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>

90ce4bc4

[PATCH] x86-64: Handle 32 bit PerfMon Counter writes cleanly in x86_64 nmi_watchdog · 16761939

Venkatesh Pallipadi authored Feb 13, 2007

P6 CPUs and Core/Core 2 CPUs which has 'architectural perf mon' feature,
only supports write of low 32 bits in Performance Monitoring Counters.
Bits 32..39 are sign extended based on bit 31 and bits 40..63 are reserved
and should be zero.

This patch:

Change x86_64 nmi handler to handle this case cleanly.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>

16761939

[PATCH] x86-64: Use constant instead of raw number in x86_64 ioperm.c · 4c3cbf75

Glauber de Oliveira Costa authored Feb 13, 2007

This is a tiny cleanup to increase readability
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>

4c3cbf75

[PATCH] x86-64: Remove fastcall references in x86_64 code · c49c5330

Glauber de Oliveira Costa authored Feb 13, 2007

Unlike x86, x86_64 already passes arguments in registers.  The use of
regparm attribute makes no difference in produced code, and the use of
fastcall just bloats the code.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>

c49c5330

[PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole · 53fee04f

Rohit Seth authored Feb 13, 2007

This patch resolves the issue of running with numa=fake=X on kernel command
line on x86_64 machines that have big IO hole.  While calculating the size
of each node now we look at the total hole size in that range.

Previously there were nodes that only had IO holes in them causing kernel
boot problems.  We now use the NODE_MIN_SIZE (64MB) as the minimum size of
memory that any node must have.  We reduce the number of allocated nodes if
the number of nodes specified on kernel command line results in any node
getting memory smaller than NODE_MIN_SIZE.

This change allows the extra memory to be incremented in NODE_MIN_SIZE
granule and uniformly distribute among as many nodes (called big nodes) as
possible.

[akpm@osdl.org: build fix]
Signed-off-by: David Rientjes <reintjes@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>

53fee04f

[PATCH] i386: romsignature/checksum cleanup · 3b3d5e1d

Rene Herman authored Feb 13, 2007

Use adding __init to romsignature() (it's only called from probe_roms()
which is itself __init) as an excuse to submit a pedantic cleanup.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>

3b3d5e1d

[PATCH] i386: improve sched_clock() on i686 · f9690982

Ingo Molnar authored Feb 13, 2007

Clean up sched_clock() on i686: it will use the TSC if available and falls
back to jiffies only if the user asked for it to be disabled via notsc or
the CPU calibration code didnt figure out the right cpu_khz.

This generally makes the scheduler timestamps more finegrained, on all
hardware.  (the current scheduler is pretty resistant against asynchronous
sched_clock() values on different CPUs, it will allow at most up to a jiffy
of jitter.)

Also simplify sched_clock()'s check for TSC availability: propagate the
desire and ability to use the TSC into the tsc_disable flag, previously
this flag only indicated whether the notsc option was passed.  This makes
the rare low-res sched_clock() codepath a single branch off a read-mostly
flag.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

f9690982

[PATCH] i386: add idle notifier · 2ff2d3d7

Stephane Eranian authored Feb 13, 2007

Add a notifier mechanism to the low level idle loop.  You can register a
callback function which gets invoked on entry and exit from the low level idle
loop.  The low level idle loop is defined as the polling loop, low-power call,
or the mwait instruction.  Interrupts processed by the idle thread are not
considered part of the low level loop.

The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode).  The perfmon subsystem uses it to turn on/off
monitoring.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

2ff2d3d7

[PATCH] i386: arch/i386/kernel/cpu/mcheck/mce.c should #include <asm/mce.h> · 86a97883

Adrian Bunk authored Feb 13, 2007

Every file should include the headers containing the prototypes for
it's global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

86a97883

[PATCH] generic: Break init() in two parts to avoid MODPOST warnings · ee5bfa64

Vivek Goyal authored Feb 13, 2007

o init() is a non __init function in .text section but it calls many
functions which are in .init.text section. Hence MODPOST generates lots
of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus from .text between 'init' (at offset 0xc0101049) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init from .text between 'init' (at offset 0xc010104e) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd from .text between 'init' (at offset 0xc0101053) and 'rest_init'

o This patch breaks down init() in two parts. One part which can go
in .init.text section and can be freed and other part which has to
be non __init(init_post()). Now init() calls init_post() and init_post()
does not call any functions present in .init sections. Hence getting
rid of warnings.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

ee5bfa64

[PATCH] i386: move startup_32() in text.head section · f8657e1b

Vivek Goyal authored Feb 13, 2007

o Entry startup_32 was in .text section but it was accessing some init
  data too and it prompts MODPOST to generate compilation warnings.

WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to
.init.data:init_pg_tables_end from .text between '_text' (at offset
0xc0100099) and 'startup_32_smp'

o Can't move startup_32 to .init.text as this entry point has to be at the
  start of bzImage. Hence moved startup_32 to a new section .text.head and
  instructed MODPOST to not to generate warnings if init data is being
  accessed from .text.head section. This code has been audited.

o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
  is not supported. Otherwise it generates more warnings

WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

f8657e1b

[PATCH] i386: Paravirt debug defaults off · 7c0b49f9

Zachary Amsden authored Feb 13, 2007

Deliberate register clobber around performance critical inline code is great for
testing, bad to leave on by default.  Many people ship with DEBUG_KERNEL turned
on, so stop making DEBUG_PARAVIRT default on.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>

7c0b49f9

[PATCH] i386: Vmi timer race · 90736e20

Zachary Amsden authored Feb 13, 2007

Because timer code moves around, and we might eventually move our init to a
late_time_init hook, save and restore IRQs around this code because it is
definitely not interrupt safe.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>

90736e20

[PATCH] i386: Kprobe rpl fix · ac3b6faf

Zachary Amsden authored Feb 13, 2007

Kprobes bugfix for paravirt compatibility - RPL on the CS when inserting
BPs must match running kernel.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Eric Biederman <ebiederm@xmission.com>

ac3b6faf

[PATCH] i386: Profile pc badness · 7b355202

Zachary Amsden authored Feb 13, 2007

Profile_pc was broken when using paravirtualization because the
assumption the kernel was running at CPL 0 was violated, causing
bad logic to read a random value off the stack.

The only way to be in kernel lock functions is to be in kernel
code, so validate that assumption explicitly by checking the CS
value.  We don't want to be fooled by BIOS / APM segments and
try to read those stacks, so only match KERNEL_CS.

I moved some stuff in segment.h to make it prettier.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>

7b355202

[PATCH] i386: vMI timer patches · bbab4f3b

Zachary Amsden authored Feb 13, 2007

VMI timer code.  It works by taking over the local APIC clock when APIC is
configured, which requires a couple hooks into the APIC code.  The backend
timer code could be commonized into the timer infrastructure, but there are
some pieces missing (stolen time, in particular), and the exact semantics of
when to do accounting for NO_IDLE need to be shared between different
hypervisors as well.  So for now, VMI timer is a separate module.

[Adrian Bunk: cleanups]

Subject: VMI timer patches
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

bbab4f3b

[PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd

Zachary Amsden authored Feb 13, 2007

Fairly straightforward implementation of VMI backend for paravirt-ops.

[Adrian Bunk: some cleanups]
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

7ce0bcfd

[PATCH] i386: SMP boot hook for paravirt · ae5da273

Zachary Amsden authored Feb 13, 2007

Add VMI SMP boot hook.  We emulate a regular boot sequence and use the same
APIC IPI initiation, we just poke magic values to load into the CPU state when
the startup IPI is received, rather than having to jump through a real mode
trampoline.

This is all that was needed to get SMP to work.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

ae5da273

[PATCH] i386: iOPL handling for paravirt guests · 8b151144

Zachary Amsden authored Feb 13, 2007

I found a clever way to make the extra IOPL switching invisible to
non-paravirt compiles - since kernel_rpl is statically defined to be zero
there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
does not restore IOPL flags unless run at CPL-0.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

8b151144

[PATCH] i386: paravirt CPU hypercall batching mode · 9226d125

Zachary Amsden authored Feb 13, 2007

The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions.  This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.

To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
 Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some
kind of state machine about when to flush cpu or mmu updates, we just allow
one or the other to be active.  Although there is no real reason a more
comprehensive scheme could not be implemented, there is also no demonstrated
need for this extra complexity.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

9226d125

[PATCH] MM: page allocation hooks for VMI backend · c119ecce

Zachary Amsden authored Feb 13, 2007

The VMI backend uses explicit page type notification to track shadow page
tables.  The allocation of page table roots is especially tricky.  We need to
clone the root for non-PAE mode while it is protected under the pgd lock to
correctly copy the shadow.

We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
they only have 4 entries, and are cached entirely by the processor, which
makes shadowing them rather simple.

For base page table level allocation, pmd_populate provides the exact hook
point we need.  Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.

Despite being required with these slightly odd semantics for VMI, Xen also
uses these hooks to determine the exact moment when page tables are created or
released.

AK: All nops for other architectures
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>

c119ecce

[PATCH] i386: arch/i386/kernel/e820.c should #include <asm/setup.h · 90611fe9

Adrian Bunk authored Feb 13, 2007

Every file should #include the headers containing the prototypes for
its global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>

90611fe9