Commits · e99f0507125f45b723a9069e9e854c3c4758e7ba · nexedi / linux

10 Sep, 2009 40 commits

KVM: x86 emulator: Prepare for emulation of syscall instructions · e99f0507

Andre Przywara authored Jun 17, 2009

Add the flags needed for syscall, sysenter and sysexit to the opcode table.
Catch (but for now ignore) the opcodes in the emulation switch/case.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e99f0507

KVM: x86 emulator: Add missing EFLAGS bit definitions · b1d86143

Andre Przywara authored Jun 17, 2009

Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b1d86143

KVM: Allow emulation of syscalls instructions on #UD · 0cb5762e

Andre Przywara authored Jun 17, 2009

Add the opcodes for syscall, sysenter and sysexit to the list of instructions
handled by the undefined opcode handler.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0cb5762e

KVM: convert custom marker based tracing to event traces · 229456fc

Marcelo Tosatti authored Jun 17, 2009

This allows use of the powerful ftrace infrastructure.

See Documentation/trace/ for usage information.

[avi, stephen: various build fixes]
[sheng: fix control register breakage]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

229456fc

KVM: SVM: Improve nested interrupt injection · 219b65dc

Alexander Graf authored Jun 15, 2009

While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.

This patch makes nested SVM's interrupt injection behave more like on a
real machine.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

219b65dc

KVM: SVM: Implement INVLPGA · ff092385

Alexander Graf authored Jun 15, 2009

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

ff092385

KVM: Implement MSRs used by Hyper-V · 3c5d0a44

Alexander Graf authored Jun 15, 2009

Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.

But let's be nice today and have it its way, because otherwise it fails
terribly.

[jaswinder: fix build for linux-next changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3c5d0a44

x86: Add definition for IGNNE MSR · 0367b433

Alexander Graf authored Jun 15, 2009

Hyper-V accesses MSR_IGNNE while running under KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

0367b433

KVM: SVM: Don't save/restore host cr2 · b3dbf89e

Avi Kivity authored Jun 16, 2009

The host never reads cr2 in process context, so are free to clobber it. The
vmx code does this, so we can safely remove the save/restore code.
Signed-off-by: Avi Kivity <avi@redhat.com>

b3dbf89e

KVM: VMX: Only reload guest cr2 if different from host cr2 · d3edefc0

Avi Kivity authored Jun 16, 2009

cr2 changes only rarely, and writing it is expensive.  Avoid the costly cr2
writes by checking if it does not already hold the desired value.

Shaves 70 cycles off the vmexit latency.
Signed-off-by: Avi Kivity <avi@redhat.com>

d3edefc0

KVM: Drop useless atomic test from timer function · 681405bf

Jan Kiszka authored Jun 09, 2009

The current code tries to optimize the setting of
KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always
returns true unless pending had the invalid value of -1 on entry. This
patch drops the test part preserving the original semantic but
expressing it less confusingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

681405bf

KVM: Fix racy event propagation in timer · f7104db2

Jan Kiszka authored Jun 09, 2009

Minor issue that likely had no practical relevance: the kvm timer
function so far incremented the pending counter and then may reset it
again to 1 in case reinjection was disabled. This opened a small racy
window with the corresponding VCPU loop that may have happened to run
on another (real) CPU and already consumed the value.

Fix it by skipping the incrementation in case pending is already > 0.
This opens a different race windows, but may only rarely cause lost
events in case we do not care about them anyway (!reinject).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f7104db2

KVM: Optimize searching for highest IRR · 33e4c686

Gleb Natapov authored Jun 11, 2009

Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much
more rare than VM entry.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

33e4c686

KVM: Replace pending exception by PF if it happens serially · 6edf14d8

Gleb Natapov authored Jun 11, 2009

Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6edf14d8

KVM: VMX: conditionally disable 2M pages · 54dee993

Marcelo Tosatti authored Jun 11, 2009

Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

54dee993

KVM: VMX: EPT misconfiguration handler · 68f89400

Marcelo Tosatti authored Jun 11, 2009

Handler for EPT misconfiguration which checks for valid state
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

68f89400

KVM: MMU: add kvm_mmu_get_spte_hierarchy helper · 94d8b056

Marcelo Tosatti authored Jun 11, 2009

Required by EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

94d8b056

KVM: MMU: make for_each_shadow_entry aware of largepages · 4d88954d

Marcelo Tosatti authored Jun 11, 2009

This way there is no need to add explicit checks in every
for_each_shadow_entry user.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4d88954d

KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits · e799794e

Marcelo Tosatti authored Jun 11, 2009

Required for EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e799794e

KVM: Move performance counter MSR access interception to generic x86 path · 71db6023

Andre Przywara authored Jun 12, 2009

The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

71db6023

KVM: MMU audit: largepage handling · 2920d728

Marcelo Tosatti authored Jun 10, 2009

Make the audit code aware of largepages.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2920d728

KVM: MMU audit: audit_mappings tweaks · 2aaf65e8

Marcelo Tosatti authored Jun 10, 2009

- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious "gva is valid but spte is notrap"
  messages (the emulation code does the guest write first, so this particular
  case is OK).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2aaf65e8

KVM: MMU audit: nontrapping ptes in nonleaf level · 48fc0317

Marcelo Tosatti authored Jun 10, 2009

It is valid to set non leaf sptes as notrap.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

48fc0317

KVM: MMU audit: update audit_write_protection · e58b0f9e

Marcelo Tosatti authored Jun 10, 2009

- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e58b0f9e

KVM: MMU audit: update count_writable_mappings / count_rmaps · 08a3732b

Marcelo Tosatti authored Jun 10, 2009

Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.

Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively).

Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.

Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

08a3732b

KVM: MMU: introduce is_last_spte helper · 776e6633

Marcelo Tosatti authored Jun 10, 2009

Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).

Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

776e6633

KVM: Return to userspace on emulation failure · 3f5d18a9

Avi Kivity authored Jun 11, 2009

Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.
Signed-off-by: Avi Kivity <avi@redhat.com>

3f5d18a9

KVM: Use macro to iterate over vcpus. · 988a2cae

Gleb Natapov authored Jun 09, 2009

[christian: remove unused variables on s390]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

988a2cae

KVM: Break dependency between vcpu index in vcpus array and vcpu_id. · 73880c80

Gleb Natapov authored Jun 09, 2009

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

73880c80

KVM: Use pointer to vcpu instead of vcpu_id in timer code. · 1ed0ce00
Gleb Natapov authored Jun 09, 2009
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
```
1ed0ce00

KVM: Introduce kvm_vcpu_is_bsp() function. · c5af89b6

Gleb Natapov authored Jun 09, 2009

Use it instead of open code "vcpu_id zero is BSP" assumption.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c5af89b6

KVM: MMU: s/shadow_pte/spte/ · d555c333

Avi Kivity authored Jun 10, 2009

We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.
Signed-off-by: Avi Kivity <avi@redhat.com>

d555c333

KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte · 43a3795a

Avi Kivity authored Jun 10, 2009

Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.
Signed-off-by: Avi Kivity <avi@redhat.com>

43a3795a

KVM: MMU: Fix is_dirty_pte() · 439e218a

Avi Kivity authored Jun 10, 2009

is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid
shadow_dirty_mask and use PT_DIRTY_MASK instead.

Misdetecting dirty pages could lead to unnecessarily setting the dirty bit
under EPT.
Signed-off-by: Avi Kivity <avi@redhat.com>

439e218a

KVM: VMX: Move rmode structure to vmx-specific code · 7ffd92c5
Avi Kivity authored Jun 09, 2009
```
rmode is only used in vmx, so move it to vmx.c
Signed-off-by: Avi Kivity <avi@redhat.com>
```
7ffd92c5

KVM: Reorder ioctls in kvm.h · 6a4a9839

Avi Kivity authored Jun 09, 2009

Somehow the VM ioctls got unsorted; resort.
Signed-off-by: Avi Kivity <avi@redhat.com>

6a4a9839

KVM: VMX: Support Unrestricted Guest feature · 3a624e29

Nitin A Kamble authored Jun 08, 2009

"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.

    It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3a624e29

KVM: switch irq injection/acking data structures to irq_lock · fa40a821

Marcelo Tosatti authored Jun 04, 2009

Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A                               CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);            worker_thread()
  -> kvm_deassign_irq()                -> kvm_assigned_dev_interrupt_work_handler()
    -> deassign_host_irq()               mutex_lock(&kvm->lock);
      -> cancel_work_sync() [blocked]

[gleb: fix ia64 path]
Reported-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fa40a821

KVM: introduce irq_lock, use it to protect ioapic · 60eead79

Marcelo Tosatti authored Jun 04, 2009

Introduce irq_lock, and use to protect ioapic data structures.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

60eead79

KVM: move coalesced_mmio locking to its own device · 64a2268d

Marcelo Tosatti authored Jun 04, 2009

Move coalesced_mmio locking to its own device, instead of relying on
kvm->lock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

64a2268d