Commits · ffde22ac53b6d6b1d7206f1172176a667eead778 · Kirill Smelkov / linux

03 Dec, 2009 40 commits

KVM: Xen PV-on-HVM guest support · ffde22ac

Ed Swierk authored Oct 15, 2009

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

ffde22ac

KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check · 94c30d9c

Jan Kiszka authored Oct 12, 2009

This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

94c30d9c

KVM: VMX: fix handle_pause declaration · 9fb41ba8
Marcelo Tosatti authored Oct 12, 2009
```
There's no kvm_run argument anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
9fb41ba8

KVM: x86: Harden against cpufreq · 6b7d7e76

Zachary Amsden authored Oct 09, 2009

If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

6b7d7e76

KVM: SVM: Support Pause Filter in AMD processors · 565d0998

Mark Langsdorf authored Oct 06, 2009

New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

565d0998

KVM: VMX: Add support for Pause-Loop Exiting · 4b8d54f9

Zhai, Edwin authored Oct 09, 2009

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

4b8d54f9

KVM: introduce kvm_vcpu_on_spin · d255f4f2

Zhai, Edwin authored Oct 09, 2009

Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d255f4f2

KVM: SVM: Remove nsvm_printk debugging code · d36f19e9

Joerg Roedel authored Oct 09, 2009

With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d36f19e9

KVM: SVM: Add tracepoint for skinit instruction · 532a46b9

Joerg Roedel authored Oct 09, 2009

This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

532a46b9

KVM: SVM: Add tracepoint for invlpga instruction · ec1ff790

Joerg Roedel authored Oct 09, 2009

This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ec1ff790

KVM: SVM: Add tracepoint for #vmexit because intr pending · 236649de

Joerg Roedel authored Oct 09, 2009

This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

236649de

KVM: SVM: Add tracepoint for injected #vmexit · 17897f36

Joerg Roedel authored Oct 09, 2009

This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

17897f36

KVM: SVM: Add tracepoint for nested #vmexit · d8cabddf

Joerg Roedel authored Oct 09, 2009

This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d8cabddf

KVM: SVM: Add tracepoint for nested vmrun · 0ac406de

Joerg Roedel authored Oct 09, 2009

This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0ac406de

KVM: SVM: Move INTR vmexit out of atomic code · cd3ff653

Joerg Roedel authored Oct 09, 2009

The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cd3ff653

KVM: SVM: Notify nested hypervisor of lost event injections · 8d23c466

Alexander Graf authored Oct 09, 2009

If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.

We don't do this in nested SVM by now which is a bug and
fixed by this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

8d23c466

KVM: x86: include pvclock MSRs in msrs_to_save · e3267cbb

Glauber Costa authored Oct 06, 2009

For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e3267cbb

KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b

Jan Kiszka authored Oct 05, 2009

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

91586a3b

KVM: x86: disable paravirt mmu reporting · a68a6a72

Marcelo Tosatti authored Oct 01, 2009

Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a68a6a72

KVM: x86: Refactor guest debug IOCTL handling · 355be0b9

Jan Kiszka authored Oct 03, 2009

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

355be0b9

KVM: remove pre_task_link setting in save_state_to_tss16 · 201d945b

Juan Quintela authored Sep 30, 2009

Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

201d945b

KVM: Fix hotplug of CPUs · 3230bb47

Zachary Amsden authored Sep 29, 2009

Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.

Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3230bb47

KVM: Fix printk name error in svm.c · e6732a5a

Zachary Amsden authored Sep 29, 2009

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e6732a5a

KVM: Kill the confusing tsc_ref_khz and ref_freq variables · 0cca7907

Zachary Amsden authored Sep 29, 2009

They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0cca7907

KVM: Separate timer intialization into an indepedent function · b820cc0c
Zachary Amsden authored Sep 29, 2009
```
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
b820cc0c

KVM: fix lock imbalance in kvm_*_irq_source_id() · 0c6ddceb

Jiri Slaby authored Sep 25, 2009

Stanse found 2 lock imbalances in kvm_request_irq_source_id and
kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths.

Fix that by adding unlock labels at the end of the functions and jump
there from the fail paths.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0c6ddceb

KVM: SVM: Remove remaining occurences of rdtscll · e935d48e

Joerg Roedel authored Sep 16, 2009

This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e935d48e

KVM: SVM: don't copy exit_int_info on nested vmrun · 33527ad7

Joerg Roedel authored Sep 16, 2009

The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

33527ad7

KVM: SVM: reorganize svm_interrupt_allowed · 7fcdb510

Joerg Roedel authored Sep 16, 2009

This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7fcdb510

KVM: remove duplicated #include · bfc33bea

Huang Weiyi authored Sep 16, 2009

Remove duplicated #include('s) in
  arch/x86/kvm/lapic.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bfc33bea

KVM: Activate Virtualization On Demand · 10474ae8

Alexander Graf authored Sep 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

10474ae8

KVM: SVM: remove needless mmap_sem acquision from nested_svm_map · e8b3433a

Marcelo Tosatti authored Sep 08, 2009

nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

e8b3433a

KVM: VMX: Enhance invalid guest state emulation · 80ced186

Mohammed Gamal authored Sep 01, 2009

- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

80ced186

KVM: x86 emulator: Add pusha and popa instructions · abcf14b5

Mohammed Gamal authored Sep 01, 2009

This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

[marcelo: remove unused variable]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

abcf14b5

KVM: x86 emulator: Add missing decoder flags for 'or' instructions · 94677e61

Mohammed Gamal authored Aug 28, 2009

Add missing decoder flags for or instructions (0xc-0xd).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

94677e61

KVM: Move assigned device code to own file · bfd99ff5
Avi Kivity authored Aug 26, 2009
```
Signed-off-by: Avi Kivity <avi@redhat.com>
```
bfd99ff5
KVM: Return -ENOTTY on unrecognized ioctls · 367e1319
Avi Kivity authored Aug 26, 2009
```
Not the incorrect -EINVAL.
Signed-off-by: Avi Kivity <avi@redhat.com>
```
367e1319

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

Gleb Natapov authored Aug 24, 2009

The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

680b3648

KVM: Move IO APIC to its own lock · eba0226b

Gleb Natapov authored Aug 24, 2009

The allows removal of irq_lock from the injection path.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

eba0226b

KVM: Convert irq notifiers lists to RCU locking · 280aa177

Gleb Natapov authored Aug 24, 2009

Use RCU locking for mask/ack notifiers lists.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

280aa177