Commits · e504c9098ed6acd9e1079c5e10e4910724ad429f · nexedi / linux

13 Nov, 2013 1 commit

kvm, vmx: Fix lazy FPU on nested guest · e504c909

Anthoine Bourgeois authored Nov 13, 2013

If a nested guest does a NM fault but its CR0 doesn't contain the TS
flag (because it was already cleared by the guest with L1 aid) then we
have to activate FPU ourselves in L0 and then continue to L2. If TS flag
is set then we fallback on the previous behavior, forward the fault to
L1 if it asked for.
Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e504c909

11 Nov, 2013 2 commits

Merge tag 'kvm-arm64/for-3.13-1' of... · ede58222

Paolo Bonzini authored Nov 11, 2013

Merge tag 'kvm-arm64/for-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next

A handful of fixes for KVM/arm64:

- A couple a basic fixes for running BE guests on a LE host
- A performance improvement for overcommitted VMs (same as the equivalent
  patch for ARM)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	arch/arm/include/asm/kvm_emulate.h
	arch/arm64/include/asm/kvm_emulate.h

ede58222

Merge tag 'kvm-arm-for-3.13-3' of git://git.linaro.org/people/cdall/linux-kvm-arm into kvm-next · 6da8ae55

Paolo Bonzini authored Nov 11, 2013

Updates for KVM/ARM, take 3 supporting more than 4 CPUs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	arch/arm/kvm/reset.c [cpu_reset->reset_regs change; context only]

6da8ae55

07 Nov, 2013 3 commits

arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu · ce94fe93

Marc Zyngier authored Nov 05, 2013

When booting a vcpu using PSCI, make sure we start it with the
endianness of the caller. Otherwise, secondaries can be pretty
unhappy to execute a BE kernel in LE mode...

This conforms to PSCI spec Rev B, 5.13.3.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

ce94fe93

arm/arm64: KVM: MMIO support for BE guest · 6d89d2d9

Marc Zyngier authored Feb 12, 2013

Do the necessary byteswap when host and guest have different
views of the universe. Actually, the only case we need to take
care of is when the guest is BE. All the other cases are naturally
handled.

Also be careful about endianness when the data is being memcopy-ed
from/to the run buffer.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

6d89d2d9

kvm, cpuid: Fix sparse warning · 1b2ca422

Borislav Petkov authored Nov 06, 2013

We need to copy padding to kernel space first before looking at it.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

1b2ca422

06 Nov, 2013 7 commits

kvm: Delete prototype for non-existent function kvm_check_iopl · a890b6fe

Josh Triplett authored Oct 20, 2013

The prototype for kvm_check_iopl appeared in commit
f850e2e6 ("KVM: x86 emulator: Check IOPL
level during io instruction emulation"), but the function never actually
existed.  Remove the prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a890b6fe

kvm: Delete prototype for non-existent function complete_pio · 35a5121b

Josh Triplett authored Oct 20, 2013

complete_pio ceased to exist in commit
7972995b ("KVM: x86 emulator: Move
string pio emulation into emulator.c"), but the prototype remained.
Remove its prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

35a5121b

hung_task: add method to reset detector · 8b414521

Marcelo Tosatti authored Oct 11, 2013

In certain occasions it is possible for a hung task detector
positive to be false: continuation from a paused VM, for example.

Add a method to reset detection, similar as is done
with other kernel watchdogs.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

8b414521

pvclock: detect watchdog reset at pvclock read · d63285e9

Marcelo Tosatti authored Oct 11, 2013

Implement reset of kernel watchdogs at pvclock read time. This avoids
adding special code to every watchdog.

This is possible for watchdogs which measure time based on sched_clock() or
ktime_get() variants.

Suggested by Don Zickus.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

d63285e9

kvm: optimize out smp_mb after srcu_read_unlock · 01b71917

Michael S. Tsirkin authored Nov 04, 2013

I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb() using smp_mb__after_srcu_read_unlock instead.

Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
        vmcall in the ballpark of 1410 cycles
after
        vmcall in the ballpark of 1360 cycles
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

01b71917

srcu: API for barrier after srcu read unlock · ce332f66

Michael S. Tsirkin authored Nov 04, 2013

srcu read lock/unlock include a full memory barrier
but that's an implementation detail.
Add an API for make memory fencing explicit for
users that need this barrier, to make sure we
can change it as needed without breaking all users.
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

ce332f66

KVM: remove vm mmap method · 80f5b5e7

Gleb Natapov authored Nov 05, 2013

It was used in conjunction with KVM_SET_MEMORY_REGION ioctl which was
removed by b74a07be in 2010, QEMU stopped using it in 2008, so
it is time to remove the code finally.
Signed-off-by: Gleb Natapov <gleb@redhat.com>

80f5b5e7

05 Nov, 2013 4 commits

KVM: IOMMU: hva align mapping page size · 27ef63c7

Greg Edwards authored Nov 04, 2013

When determining the page size we could use to map with the IOMMU, the
page size should also be aligned with the hva, not just the gfn. The
gfn may not reflect the real alignment within the hugetlbfs file.

Most of the time, this works fine. However, if the hugetlbfs file is
backed by non-contiguous huge pages, a multi-huge page memslot starts at
an unaligned offset within the hugetlbfs file, and the gfn is aligned
with respect to the huge page size, kvm_host_page_size() will return the
huge page size and we will use that to map with the IOMMU.

When we later unpin that same memslot, the IOMMU returns the unmap size
as the huge page size, and we happily unpin that many pfns in
monotonically increasing order, not realizing we are spanning
non-contiguous huge pages and partially unpin the wrong huge page.

Ensure the IOMMU mapping page size is aligned with the hva corresponding
to the gfn, which does reflect the alignment within the hugetlbfs file.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>

27ef63c7

KVM: x86: trace cpuid emulation when called from emulator · a9d4e439

Gleb Natapov authored Nov 04, 2013

Currently cpuid emulation is traced only when executed by intercept.
Move trace point so that emulator invocation is traced too.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a9d4e439

KVM: emulator: cleanup decode_register_operand() a bit · 6d4d85ec

Gleb Natapov authored Nov 04, 2013

Make code shorter.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

6d4d85ec

KVM: emulator: check rex prefix inside decode_register() · aa9ac1a6

Gleb Natapov authored Nov 04, 2013

All decode_register() callers check if instruction has rex prefix
to properly decode one byte operand. It make sense to move the check
inside.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

aa9ac1a6

04 Nov, 2013 1 commit
- Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queue · 95f328d3
  Gleb Natapov authored Nov 04, 2013
```
Conflicts:
	arch/powerpc/include/asm/processor.h
```
  95f328d3
03 Nov, 2013 1 commit

KVM: x86: fix emulation of "movzbl %bpl, %eax" · daf72722

Paolo Bonzini authored Oct 31, 2013

When I was looking at RHEL5.9's failure to start with
unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a
slightly older tree than kvm.git.  I now debugged the remaining failure,
which was introduced by commit 660696d1 (KVM: X86 emulator: fix
source operand decoding for 8bit mov[zs]x instructions, 2013-04-24)
introduced a similar mis-emulation to the one in commit 8acb4207 (KVM:
fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30).  The incorrect
decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand
is sil/dil/bpl/spl.

Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression
prolog, just a handful of instructions before finally giving control to
the decompressed vmlinux and getting out of the invalid guest state.

Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix
must be applied to OpMem8.
Reported-by: Michele Baldessari <michele@redhat.com>
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

daf72722

31 Oct, 2013 8 commits

kvm_host: typo fix · 81e87e26

Michael S. Tsirkin authored Oct 30, 2013

fix up typo in comment.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81e87e26

KVM: x86: emulate SAHF instruction · 98f73630

Paolo Bonzini authored Oct 31, 2013

Yet another instruction that we fail to emulate, this time found
in Windows 2008R2 32-bit.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

98f73630

MAINTAINERS: add tree for kvm.git · a94b40a6

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: KVM List <kvm@vger.kernel.org>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a94b40a6

Documentation/kvm: add a 00-INDEX file · 6beda1e5

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
[Some editing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6beda1e5

MAINTAINERS: fix broken link to www.linux-kvm.org · e3e58478

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e3e58478

kvm/vmx: error message typo fix · 60266204

Michael S. Tsirkin authored Oct 31, 2013

mst can't be blamed for lack of switch entries: the
issue is with msrs actually.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

60266204

KVM: x86: fix KVM_SET_XCRS loop · c67a04cb

Paolo Bonzini authored Oct 17, 2013

The loop was always using 0 as the index.  This means that
any rubbish after the first element of the array went undetected.
It seems reasonable to assume that no KVM userspace did that.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c67a04cb

KVM: x86: fix KVM_SET_XCRS for CPUs that do not support XSAVE · 46c34cb0

Paolo Bonzini authored Oct 17, 2013

The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS
could return.  XCR0's bit 0 is always 1 in real processors with
XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the
emulated processor does not have XSAVE.  So, KVM_SET_XCRS must
ignore that bit when checking for attempts to enable unsupported
save states.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

46c34cb0

30 Oct, 2013 8 commits

kvm: Create non-coherent DMA registeration · e0f0bbc5

Alex Williamson authored Oct 30, 2013

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0f0bbc5

kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6

Alex Williamson authored Oct 30, 2013

Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d96eb2c6

kvm: Add VFIO device · ec53500f

Alex Williamson authored Oct 30, 2013

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but areas are cropping up where a connection beyond eventfds
and irqfds needs to be made. This patch introduces a KVM-VFIO device
that is meant to be a gateway for such interaction. The user creates
the device and can add and remove VFIO groups to it via file
descriptors. When a group is added, KVM verifies the group is valid
and gets a reference to it via the VFIO external user interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ec53500f

kvm: Emulate MOVBE · 84cffe49

Borislav Petkov authored Oct 29, 2013

This basically came from the need to be able to boot 32-bit Atom SMP
guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
matter of fact, qemu has since recently received MOVBE support but we
cannot share that with kvm emulation and thus we have to do this in the
host. We're waay faster in kvm anyway. :-)

So, we piggyback on the #UD path and emulate the MOVBE functionality.
With it, an 8-core SMP guest boots in under 6 seconds.

Also, requesting MOVBE emulation needs to happen explicitly to work,
i.e. qemu -cpu n270,+movbe...

Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
kernel in kvm executes MOVBE ~60K times.
Signed-off-by: Andre Przywara <andre@andrep.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

84cffe49

kvm, emulator: Add initial three-byte insns support · 0bc5eedb

Borislav Petkov authored Oct 29, 2013

Add initial support for handling three-byte instructions in the
emulator.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0bc5eedb

kvm, emulator: Rename VendorSpecific flag · b51e974f

Borislav Petkov authored Sep 22, 2013

Call it EmulateOnUD which is exactly what we're trying to do with
vendor-specific instructions.

Rename ->only_vendor_specific_insn to something shorter, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b51e974f

kvm, emulator: Use opcode length · 1ce19dc1

Borislav Petkov authored Sep 22, 2013

Add a field to the current emulation context which contains the
instruction opcode length. This will streamline handling of opcodes of
different length.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1ce19dc1

kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d

Borislav Petkov authored Sep 22, 2013

Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9c15bb1d

29 Oct, 2013 1 commit

arm64: KVM: Yield CPU when vcpu executes a WFE · d241aac7

Marc Zyngier authored Aug 02, 2013

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

d241aac7

28 Oct, 2013 4 commits

Merge tag 'kvm-arm-for-3.13-2' of git://git.linaro.org/people/cdall/linux-kvm-arm into kvm-queue · 5bb3398d

Paolo Bonzini authored Oct 28, 2013

Updates for KVM/ARM, take 2 including:
 - Transparent Huge Pages and hugetlbfs support for KVM/ARM
 - Yield CPU when guest executes WFE to speed up CPU overcommit

5bb3398d

KVM: Mapping IOMMU pages after updating memslot · e0230e13

Yang Zhang authored Oct 24, 2013

In kvm_iommu_map_pages(), we need to know the page size via call
kvm_host_page_size(). And it will check whether the target slot
is valid before return the right page size.
Currently, we will map the iommu pages when creating a new slot.
But we call kvm_iommu_map_pages() during preparing the new slot.
At that time, the new slot is not visible by domain(still in preparing).
So we cannot get the right page size from kvm_host_page_size() and
this will break the IOMMU super page logic.
The solution is to map the iommu pages after we insert the new slot
into domain.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0230e13

nVMX: Report CPU_BASED_VIRTUAL_NMI_PENDING as supported · a294c9bb

Jan Kiszka authored Oct 23, 2013

If the host supports it, we can and should expose it to the guest as
well, just like we already do with PIN_BASED_VIRTUAL_NMIS.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a294c9bb

nVMX: Fix pick-up of uninjected NMIs · cd2633c5

Jan Kiszka authored Oct 23, 2013

__vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected,
not arch.nmi_pending. So we actually need to check the former field in
vmcs12_save_pending_event. This fixes the eventinj unit test when run
in nested KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cd2633c5