Commits · f23d1f4a116038c68df224deae6718fde87d8f0d · Kirill Smelkov / linux

06 Dec, 2012 2 commits

x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary · f23d1f4a

Zhang Yanfei authored Dec 06, 2012

This patch provides a way to VMCLEAR VMCSs related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

f23d1f4a

KVM: MMU: optimize for set_spte · c2193463

Xiao Guangrong authored Dec 04, 2012

There are two cases we need to adjust page size in set_spte:
1): the one is other vcpu creates new sp in the window between mapping_level()
    and acquiring mmu-lock.
2): the another case is the new sp is created by itself (page-fault path) when
    guest uses the target gfn as its page table.

In current code, set_spte drop the spte and emulate the access for these case,
it works not good:
- for the case 1, it may destroy the mapping established by other vcpu, and
  do expensive instruction emulation.
- for the case 2, it may emulate the access even if the guest is accessing
  the page which not used as page table. There is a example, 0~2M is used as
  huge page in guest, in this huge page, only page 3 used as page table, then
  guest read/writes on other pages can cause instruction emulation.

Both of these cases can be fixed by allowing guest to retry the access, it
will refault, then we can establish the mapping by using small page
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c2193463

05 Dec, 2012 5 commits

KVM: x86: Make register state after reset conform to specification · 66f7b72e

Julian Stecklina authored Dec 05, 2012

VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT.  SVM code properly initializes EDX.
Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

66f7b72e

kvm: don't use bit24 for detecting address-specific invalidation capability · 2b3c5cbc

Zhang Xiantao authored Dec 05, 2012

Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability
reporting, so remove it from KVM to avoid conflicts in future.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

2b3c5cbc

kvm: remove unnecessary bit checking for ept violation · 0307b7b8

Zhang Xiantao authored Dec 05, 2012

Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0307b7b8

kvm: deliver msi interrupts from irq handler · 78c63440

Michael S. Tsirkin authored Oct 17, 2012

We can deliver certain interrupts, notably MSI,
from atomic context.  Use kvm_set_irq_inatomic,
to implement an irq handler for msi.

This reduces the pressure on scheduler in case
where host and guest irq share a host cpu.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

78c63440

kvm: add kvm_set_irq_inatomic · 01f21880

Michael S. Tsirkin authored Oct 17, 2012

Add an API to inject IRQ from atomic context.
Return EWOULDBLOCK if impossible (e.g. for multicast).
Only MSI is supported ATM.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

01f21880

02 Dec, 2012 1 commit

KVM: x86: Fix uninitialized return code · 45e3cc7d

Jan Kiszka authored Dec 02, 2012

This is a regression caused by 18595411.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

45e3cc7d

30 Nov, 2012 3 commits

KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635

Will Auld authored Nov 29, 2012

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest
vcpu specific location to store the value of the emulated MSR while adding
the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
is of course as long as the "use TSC counter offsetting" VM-execution control
is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST +
vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
of these three locations. The argument against storing it in the actual MSR
is performance. This is likely to be seldom used while the save/restore is
required on every transition. IA32_TSC_ADJUST was created as a way to solve
some issues with writing TSC itself so that is not an option either.

The remaining option, defined above as our solution has the problem of
returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
done here) as mentioned above. However, more problematic is that storing the
data in vmcs tsc_offset will have a different semantic effect on the system
than does using the actual MSR. This is illustrated in the following example:

The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
process performs a rdtsc. In this case the guest process will get
TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
as seen by the guest do not and hence this will not cause a problem.
Signed-off-by: Will Auld <will.auld@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ba904635

KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46

Will Auld authored Nov 29, 2012

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST msr
while setting the tsc value. It is anticipated that this capability is
useful for other tasks.
Signed-off-by: Will Auld <will.auld@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

8fe8ab46

KVM: Fix user memslot overlap check · 5419369e

Alex Williamson authored Nov 29, 2012

Prior to memory slot sorting this loop compared all of the user memory
slots for overlap with new entries.  With memory slot sorting, we're
just checking some number of entries in the array that may or may not
be user slots.  Instead, walk all the slots with kvm_for_each_memslot,
which has the added benefit of terminating early when we hit the first
empty slot, and skip comparison to private slots.

Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5419369e

29 Nov, 2012 2 commits

KVM: VMX: fix memory order between loading vmcs and clearing vmcs · 5a560f8b

Xiao Guangrong authored Nov 28, 2012

vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5a560f8b

KVM: VMX: fix invalid cpu passed to smp_call_function_single · e6c7d321

Xiao Guangrong authored Nov 28, 2012

In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e6c7d321

28 Nov, 2012 19 commits

KVM: use is_idle_task() instead of idle_cpu() to decide when to halt in async_pf · 859f8450

Gleb Natapov authored Nov 28, 2012

As Frederic pointed idle_cpu() may return false even if async fault
happened in the idle task if wake up is pending. In this case the code
will try to put idle task to sleep. Fix this by using is_idle_task() to
check for idle task.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

859f8450

KVM: x86: update pvclock area conditionally, on cpu migration · d98d07ca

Marcelo Tosatti authored Nov 27, 2012

As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d98d07ca

KVM: x86: require matched TSC offsets for master clock · b48aa97e

Marcelo Tosatti authored Nov 27, 2012

With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b48aa97e

KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization · 42897d86
Marcelo Tosatti authored Nov 27, 2012
```
TSC initialization will soon make use of online_vcpus.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
42897d86

KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e

Marcelo Tosatti authored Nov 27, 2012

KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d828199e

KVM: x86: notifier for clocksource changes · 16e8d74d

Marcelo Tosatti authored Nov 27, 2012

Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

16e8d74d

time: export time information for KVM pvclock · e0b306fe

Marcelo Tosatti authored Nov 27, 2012

As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e0b306fe

KVM: x86: pass host_tsc to read_l1_tsc · 886b470c

Marcelo Tosatti authored Nov 27, 2012

Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

886b470c

x86: vdso: pvclock gettime support · 51c19b4f

Marcelo Tosatti authored Nov 27, 2012

Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

51c19b4f

x86: kvm guest: pvclock vsyscall support · 3dc4f7cf

Marcelo Tosatti authored Nov 27, 2012

Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3dc4f7cf

x86: pvclock: generic pvclock vsyscall initialization · 71056ae2

Marcelo Tosatti authored Nov 27, 2012

Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization
routines.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

71056ae2

sched: add notifier for cross-cpu migrations · 582b336e

Marcelo Tosatti authored Nov 27, 2012

Originally from Jeremy Fitzhardinge.
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

582b336e

x86: pvclock: add note about rdtsc barriers · 189e1173

Marcelo Tosatti authored Nov 27, 2012

As noted by Gleb, not advertising SSE2 support implies
no RDTSC barriers.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

189e1173

x86: pvclock: introduce helper to read flags · 2697902b

Marcelo Tosatti authored Nov 27, 2012

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2697902b

x86: pvclock: create helper for pvclock data retrieval · dce2db0a

Marcelo Tosatti authored Nov 27, 2012

Originally from Jeremy Fitzhardinge.

So code can be reused.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

dce2db0a

x86: pvclock: remove pvclock_shadow_time · 42b5637d

Marcelo Tosatti authored Nov 27, 2012

Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

42b5637d

x86: pvclock: make sure rdtsc doesnt speculate out of region · b01578de

Marcelo Tosatti authored Nov 27, 2012

Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b01578de

x86: kvmclock: allocate pvclock shared memory area · 7069ed67

Marcelo Tosatti authored Nov 27, 2012

We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.

For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.

There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7069ed67

KVM: x86: retain pvclock guest stopped bit in guest memory · 78c0337a

Marcelo Tosatti authored Nov 27, 2012

Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

78c0337a

14 Nov, 2012 3 commits

KVM: remove unnecessary return value check · 807f12e5

Guo Chao authored Nov 02, 2012

No need to check return value before breaking switch.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

807f12e5

KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr() · 951179ce

Guo Chao authored Nov 02, 2012

Return value of this function will be that of ioctl().

#include <stdio.h>
#include <linux/kvm.h>

int main () {
	int fd;
	fd = open ("/dev/kvm", 0);
	fd = ioctl (fd, KVM_CREATE_VM, 0);
	ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000);
	perror ("");
	return 0;
}

Output is "Operation not permitted". That's not what
we want.

Return -EINVAL in this case.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

951179ce

KVM: do not kfree error pointer · 18595411

Guo Chao authored Nov 02, 2012

We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

18595411

01 Nov, 2012 2 commits

Merge branch 'for-queue' of https://github.com/agraf/linux-2.6 into queue · f026399f

Marcelo Tosatti authored Oct 31, 2012

* 'for-queue' of https://github.com/agraf/linux-2.6:
  PPC: ePAPR: Convert hcall header to uapi (round 2)
  KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
  KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
  KVM: PPC: Book3S HV: Fix accounting of stolen time
  KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
  KVM: PPC: Book3S HV: Fixes for late-joining threads
  KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
  KVM: PPC: Book3S HV: Fix some races in starting secondary threads
  KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  PPC: ePAPR: Convert header to uapi
  KVM: PPC: Move mtspr/mfspr emulation into own functions
  KVM: Documentation: Fix reentry-to-be-consistent paragraph
  KVM: PPC: 44x: fix DCR read/write

f026399f

KVM: SVM: update MAINTAINERS entry · 7de609c8

Joerg Roedel authored Oct 29, 2012

I have no access to my AMD email address anymore. Update
entry in MAINTAINERS to the new address.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7de609c8

31 Oct, 2012 2 commits

PPC: ePAPR: Convert hcall header to uapi (round 2) · 63a19091

Alexander Graf authored Oct 31, 2012

The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>

63a19091

Merge commit 'origin/queue' into for-queue · 0588000e
Alexander Graf authored Oct 31, 2012
```
Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/uapi/asm/Kbuild
```
0588000e

30 Oct, 2012 1 commit

KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte() · 8b5869ad

Paul Mackerras authored Oct 15, 2012

This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable.  Nevertheless it should still be fixed.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

8b5869ad