Commits · 681405bfc73a2717ae9b03b2bad465b009106f31 · Kirill Smelkov / linux

10 Sep, 2009 40 commits

KVM: Drop useless atomic test from timer function · 681405bf

Jan Kiszka authored Jun 09, 2009

The current code tries to optimize the setting of
KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always
returns true unless pending had the invalid value of -1 on entry. This
patch drops the test part preserving the original semantic but
expressing it less confusingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

681405bf

KVM: Fix racy event propagation in timer · f7104db2

Jan Kiszka authored Jun 09, 2009

Minor issue that likely had no practical relevance: the kvm timer
function so far incremented the pending counter and then may reset it
again to 1 in case reinjection was disabled. This opened a small racy
window with the corresponding VCPU loop that may have happened to run
on another (real) CPU and already consumed the value.

Fix it by skipping the incrementation in case pending is already > 0.
This opens a different race windows, but may only rarely cause lost
events in case we do not care about them anyway (!reinject).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f7104db2

KVM: Optimize searching for highest IRR · 33e4c686

Gleb Natapov authored Jun 11, 2009

Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much
more rare than VM entry.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

33e4c686

KVM: Replace pending exception by PF if it happens serially · 6edf14d8

Gleb Natapov authored Jun 11, 2009

Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6edf14d8

KVM: VMX: conditionally disable 2M pages · 54dee993

Marcelo Tosatti authored Jun 11, 2009

Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

54dee993

KVM: VMX: EPT misconfiguration handler · 68f89400

Marcelo Tosatti authored Jun 11, 2009

Handler for EPT misconfiguration which checks for valid state
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

68f89400

KVM: MMU: add kvm_mmu_get_spte_hierarchy helper · 94d8b056

Marcelo Tosatti authored Jun 11, 2009

Required by EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

94d8b056

KVM: MMU: make for_each_shadow_entry aware of largepages · 4d88954d

Marcelo Tosatti authored Jun 11, 2009

This way there is no need to add explicit checks in every
for_each_shadow_entry user.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4d88954d

KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits · e799794e

Marcelo Tosatti authored Jun 11, 2009

Required for EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e799794e

KVM: Move performance counter MSR access interception to generic x86 path · 71db6023

Andre Przywara authored Jun 12, 2009

The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

71db6023

KVM: MMU audit: largepage handling · 2920d728

Marcelo Tosatti authored Jun 10, 2009

Make the audit code aware of largepages.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2920d728

KVM: MMU audit: audit_mappings tweaks · 2aaf65e8

Marcelo Tosatti authored Jun 10, 2009

- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious "gva is valid but spte is notrap"
  messages (the emulation code does the guest write first, so this particular
  case is OK).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2aaf65e8

KVM: MMU audit: nontrapping ptes in nonleaf level · 48fc0317

Marcelo Tosatti authored Jun 10, 2009

It is valid to set non leaf sptes as notrap.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

48fc0317

KVM: MMU audit: update audit_write_protection · e58b0f9e

Marcelo Tosatti authored Jun 10, 2009

- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e58b0f9e

KVM: MMU audit: update count_writable_mappings / count_rmaps · 08a3732b

Marcelo Tosatti authored Jun 10, 2009

Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.

Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively).

Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.

Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

08a3732b

KVM: MMU: introduce is_last_spte helper · 776e6633

Marcelo Tosatti authored Jun 10, 2009

Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).

Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

776e6633

KVM: Return to userspace on emulation failure · 3f5d18a9

Avi Kivity authored Jun 11, 2009

Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.
Signed-off-by: Avi Kivity <avi@redhat.com>

3f5d18a9

KVM: Use macro to iterate over vcpus. · 988a2cae

Gleb Natapov authored Jun 09, 2009

[christian: remove unused variables on s390]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

988a2cae

KVM: Break dependency between vcpu index in vcpus array and vcpu_id. · 73880c80

Gleb Natapov authored Jun 09, 2009

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

73880c80

KVM: Use pointer to vcpu instead of vcpu_id in timer code. · 1ed0ce00
Gleb Natapov authored Jun 09, 2009
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
```
1ed0ce00

KVM: Introduce kvm_vcpu_is_bsp() function. · c5af89b6

Gleb Natapov authored Jun 09, 2009

Use it instead of open code "vcpu_id zero is BSP" assumption.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c5af89b6

KVM: MMU: s/shadow_pte/spte/ · d555c333

Avi Kivity authored Jun 10, 2009

We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.
Signed-off-by: Avi Kivity <avi@redhat.com>

d555c333

KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte · 43a3795a

Avi Kivity authored Jun 10, 2009

Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.
Signed-off-by: Avi Kivity <avi@redhat.com>

43a3795a

KVM: MMU: Fix is_dirty_pte() · 439e218a

Avi Kivity authored Jun 10, 2009

is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid
shadow_dirty_mask and use PT_DIRTY_MASK instead.

Misdetecting dirty pages could lead to unnecessarily setting the dirty bit
under EPT.
Signed-off-by: Avi Kivity <avi@redhat.com>

439e218a

KVM: VMX: Move rmode structure to vmx-specific code · 7ffd92c5
Avi Kivity authored Jun 09, 2009
```
rmode is only used in vmx, so move it to vmx.c
Signed-off-by: Avi Kivity <avi@redhat.com>
```
7ffd92c5

KVM: Reorder ioctls in kvm.h · 6a4a9839

Avi Kivity authored Jun 09, 2009

Somehow the VM ioctls got unsorted; resort.
Signed-off-by: Avi Kivity <avi@redhat.com>

6a4a9839

KVM: VMX: Support Unrestricted Guest feature · 3a624e29

Nitin A Kamble authored Jun 08, 2009

"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.

    It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3a624e29

KVM: switch irq injection/acking data structures to irq_lock · fa40a821

Marcelo Tosatti authored Jun 04, 2009

Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A                               CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);            worker_thread()
  -> kvm_deassign_irq()                -> kvm_assigned_dev_interrupt_work_handler()
    -> deassign_host_irq()               mutex_lock(&kvm->lock);
      -> cancel_work_sync() [blocked]

[gleb: fix ia64 path]
Reported-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fa40a821

KVM: introduce irq_lock, use it to protect ioapic · 60eead79

Marcelo Tosatti authored Jun 04, 2009

Introduce irq_lock, and use to protect ioapic data structures.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

60eead79

KVM: move coalesced_mmio locking to its own device · 64a2268d

Marcelo Tosatti authored Jun 04, 2009

Move coalesced_mmio locking to its own device, instead of relying on
kvm->lock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

64a2268d

KVM: Grab pic lock in kvm_pic_clear_isr_ack · 9f4cc127

Marcelo Tosatti authored Jun 04, 2009

isr_ack is protected by kvm_pic->lock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

9f4cc127

KVM: Cleanup LAPIC interface · 238adc77

Jan Kiszka authored Jun 05, 2009

None of the interface services the LAPIC emulation provides need to be
exported to modules, and kvm_lapic_get_base is even totally unused
today.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

238adc77

KVM: ppc: e500: Add MMUCFG and PVR emulation · 06579dd9

Liu Yu authored Jun 05, 2009

Latest kernel started to use these two registers.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

06579dd9

KVM: ppc: e500: Directly pass pvr to guest · 5b7c1a2c

Liu Yu authored Jun 05, 2009

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

5b7c1a2c

KVM: ppc: e500: Move to Book-3e MMU definitions · 0cfb50e5

Liu Yu authored Jun 05, 2009

According to commit 70fe3af8.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0cfb50e5

KVM: Calculate available entries in coalesced mmio ring · 105f8d40

Avi Kivity authored Jun 04, 2009

Instead of checking whether we'll wrap around, calculate how many entries
are available, and check whether we have enough (just one) for the pending
mmio.

By itself, this doesn't change anything, but it paves the way for making
this function lockless.
Signed-off-by: Avi Kivity <avi@redhat.com>

105f8d40

KVM: VMX: Fix reporting of unhandled EPT violations · 596ae895

Avi Kivity authored Jun 03, 2009

Instead of returning -ENOTSUPP, exit normally but indicate the hardware
exit reason.
Signed-off-by: Avi Kivity <avi@redhat.com>

596ae895

KVM: Cache pdptrs · 6de4f3ad

Avi Kivity authored May 31, 2009

Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx,
guest memory access on svm) extract them on demand.
Signed-off-by: Avi Kivity <avi@redhat.com>

6de4f3ad

KVM: VMX: Simplify pdptr and cr3 management · 8f5d549f

Avi Kivity authored May 31, 2009

Instead of reading the PDPTRs from memory after every exit (which is slow
and wrong, as the PDPTRs are stored on the cpu), sync the PDPTRs from
memory to the VMCS before entry, and from the VMCS to memory after exit.
Do the same for cr3.
Signed-off-by: Avi Kivity <avi@redhat.com>

8f5d549f

KVM: VMX: Avoid duplicate ept tlb flush when setting cr3 · 2d84e993

Avi Kivity authored May 31, 2009

vmx_set_cr3() will call vmx_tlb_flush(), which will flush the ept context.
So there is no need to call ept_sync_context() explicitly.
Signed-off-by: Avi Kivity <avi@redhat.com>

2d84e993