Commits · daf727225b8abfdfe424716abac3d15a3ac5626a · Kirill Smelkov / linux

03 Nov, 2013 1 commit

KVM: x86: fix emulation of "movzbl %bpl, %eax" · daf72722

Paolo Bonzini authored Oct 31, 2013

When I was looking at RHEL5.9's failure to start with
unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a
slightly older tree than kvm.git.  I now debugged the remaining failure,
which was introduced by commit 660696d1 (KVM: X86 emulator: fix
source operand decoding for 8bit mov[zs]x instructions, 2013-04-24)
introduced a similar mis-emulation to the one in commit 8acb4207 (KVM:
fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30).  The incorrect
decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand
is sil/dil/bpl/spl.

Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression
prolog, just a handful of instructions before finally giving control to
the decompressed vmlinux and getting out of the invalid guest state.

Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix
must be applied to OpMem8.
Reported-by: Michele Baldessari <michele@redhat.com>
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

daf72722

31 Oct, 2013 8 commits

kvm_host: typo fix · 81e87e26

Michael S. Tsirkin authored Oct 30, 2013

fix up typo in comment.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81e87e26

KVM: x86: emulate SAHF instruction · 98f73630

Paolo Bonzini authored Oct 31, 2013

Yet another instruction that we fail to emulate, this time found
in Windows 2008R2 32-bit.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

98f73630

MAINTAINERS: add tree for kvm.git · a94b40a6

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: KVM List <kvm@vger.kernel.org>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a94b40a6

Documentation/kvm: add a 00-INDEX file · 6beda1e5

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
[Some editing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6beda1e5

MAINTAINERS: fix broken link to www.linux-kvm.org · e3e58478

Ramkumar Ramachandra authored Oct 31, 2013

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e3e58478

kvm/vmx: error message typo fix · 60266204

Michael S. Tsirkin authored Oct 31, 2013

mst can't be blamed for lack of switch entries: the
issue is with msrs actually.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

60266204

KVM: x86: fix KVM_SET_XCRS loop · c67a04cb

Paolo Bonzini authored Oct 17, 2013

The loop was always using 0 as the index.  This means that
any rubbish after the first element of the array went undetected.
It seems reasonable to assume that no KVM userspace did that.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c67a04cb

KVM: x86: fix KVM_SET_XCRS for CPUs that do not support XSAVE · 46c34cb0

Paolo Bonzini authored Oct 17, 2013

The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS
could return.  XCR0's bit 0 is always 1 in real processors with
XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the
emulated processor does not have XSAVE.  So, KVM_SET_XCRS must
ignore that bit when checking for attempts to enable unsupported
save states.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

46c34cb0

30 Oct, 2013 8 commits

kvm: Create non-coherent DMA registeration · e0f0bbc5

Alex Williamson authored Oct 30, 2013

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0f0bbc5

kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6

Alex Williamson authored Oct 30, 2013

Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d96eb2c6

kvm: Add VFIO device · ec53500f

Alex Williamson authored Oct 30, 2013

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but areas are cropping up where a connection beyond eventfds
and irqfds needs to be made. This patch introduces a KVM-VFIO device
that is meant to be a gateway for such interaction. The user creates
the device and can add and remove VFIO groups to it via file
descriptors. When a group is added, KVM verifies the group is valid
and gets a reference to it via the VFIO external user interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ec53500f

kvm: Emulate MOVBE · 84cffe49

Borislav Petkov authored Oct 29, 2013

This basically came from the need to be able to boot 32-bit Atom SMP
guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
matter of fact, qemu has since recently received MOVBE support but we
cannot share that with kvm emulation and thus we have to do this in the
host. We're waay faster in kvm anyway. :-)

So, we piggyback on the #UD path and emulate the MOVBE functionality.
With it, an 8-core SMP guest boots in under 6 seconds.

Also, requesting MOVBE emulation needs to happen explicitly to work,
i.e. qemu -cpu n270,+movbe...

Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
kernel in kvm executes MOVBE ~60K times.
Signed-off-by: Andre Przywara <andre@andrep.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

84cffe49

kvm, emulator: Add initial three-byte insns support · 0bc5eedb

Borislav Petkov authored Oct 29, 2013

Add initial support for handling three-byte instructions in the
emulator.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0bc5eedb

kvm, emulator: Rename VendorSpecific flag · b51e974f

Borislav Petkov authored Sep 22, 2013

Call it EmulateOnUD which is exactly what we're trying to do with
vendor-specific instructions.

Rename ->only_vendor_specific_insn to something shorter, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b51e974f

kvm, emulator: Use opcode length · 1ce19dc1

Borislav Petkov authored Sep 22, 2013

Add a field to the current emulation context which contains the
instruction opcode length. This will streamline handling of opcodes of
different length.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1ce19dc1

kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d

Borislav Petkov authored Sep 22, 2013

Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9c15bb1d

28 Oct, 2013 5 commits

Merge tag 'kvm-arm-for-3.13-2' of git://git.linaro.org/people/cdall/linux-kvm-arm into kvm-queue · 5bb3398d

Paolo Bonzini authored Oct 28, 2013

Updates for KVM/ARM, take 2 including:
 - Transparent Huge Pages and hugetlbfs support for KVM/ARM
 - Yield CPU when guest executes WFE to speed up CPU overcommit

5bb3398d

KVM: Mapping IOMMU pages after updating memslot · e0230e13

Yang Zhang authored Oct 24, 2013

In kvm_iommu_map_pages(), we need to know the page size via call
kvm_host_page_size(). And it will check whether the target slot
is valid before return the right page size.
Currently, we will map the iommu pages when creating a new slot.
But we call kvm_iommu_map_pages() during preparing the new slot.
At that time, the new slot is not visible by domain(still in preparing).
So we cannot get the right page size from kvm_host_page_size() and
this will break the IOMMU super page logic.
The solution is to map the iommu pages after we insert the new slot
into domain.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0230e13

nVMX: Report CPU_BASED_VIRTUAL_NMI_PENDING as supported · a294c9bb

Jan Kiszka authored Oct 23, 2013

If the host supports it, we can and should expose it to the guest as
well, just like we already do with PIN_BASED_VIRTUAL_NMIS.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a294c9bb

nVMX: Fix pick-up of uninjected NMIs · cd2633c5

Jan Kiszka authored Oct 23, 2013

__vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected,
not arch.nmi_pending. So we actually need to check the former field in
vmcs12_save_pending_event. This fixes the eventinj unit test when run
in nested KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cd2633c5

KVM: nVMX: Report 2MB EPT pages as supported · d3134dbf

Jan Kiszka authored Oct 23, 2013

As long as the hardware provides us 2MB EPT pages, we can also expose
them to the guest because our shadow EPT code already supports this
feature.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d3134dbf

18 Oct, 2013 2 commits

KVM: ARM: Transparent huge page (THP) support · 9b5fdb97

Christoffer Dall authored Oct 02, 2013

Support transparent huge pages in KVM/ARM and KVM/ARM64.  The
transparent_hugepage_adjust is not very pretty, but this is also how
it's solved on x86 and seems to be simply an artifact on how THPs
behave.  This should eventually be shared across architectures if
possible, but that can always be changed down the road.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

9b5fdb97

KVM: ARM: Support hugetlbfs backed huge pages · ad361f09

Christoffer Dall authored Nov 01, 2012

Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
the unmap path may feel a bit silly as the pud_huge check is always
defined to false, but the compiler should be smart about this.

Note: This deals only with VMAs marked as huge which are allocated by
users through hugetlbfs only.  Transparent huge pages can only be
detected by looking at the underlying pages (or the page tables
themselves) and this patch so far simply maps these on a page-by-page
level in the Stage-2 page tables.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

ad361f09

17 Oct, 2013 3 commits

KVM: ARM: Update comments for kvm_handle_wfi · 86ed81aa

Christoffer Dall authored Oct 15, 2013

Update comments to reflect what is really going on and add the TWE bit
to the comments in kvm_arm.h.

Also renames the function to kvm_handle_wfx like is done on arm64 for
consistency and uber-correctness.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

86ed81aa

ARM: KVM: Yield CPU when vcpu executes a WFE · 58d5ec8f

Marc Zyngier authored Oct 08, 2013

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Quick test to estimate the performance: hackbench 1 process 1000

2xA15 host (baseline):	1.843s

2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out

2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s

So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

58d5ec8f

Powerpc KVM work is based on a commit after rc4. · 13acfd57
Gleb Natapov authored Oct 17, 2013
```
Merging master into next to satisfy the dependencies.

Conflicts:
	arch/arm/kvm/reset.c
```
13acfd57

16 Oct, 2013 4 commits

Merge tag 'kvm-arm-for-3.13-1' of git://git.linaro.org/people/cdall/linux-kvm-arm into next · d5701426
Gleb Natapov authored Oct 16, 2013
```
Updates for KVM/ARM including cpu=host and Cortex-A7 support
```
d5701426

KVM: ARM: Remove non-ASCII space characters · a7265fb1

Christoffer Dall authored Oct 15, 2013

Some strange character leaped into the documentation, which makes
git-send-email behave quite strangely.  Get rid of this before it bites
anyone else.

Cc: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

a7265fb1

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux · 34ec4de4

Linus Torvalds authored Oct 15, 2013

Pull device tree fixes and reverts from Grant Likely:
 "One bug fix and three reverts.  The reverts back out the slightly
  controversial feeding the entire device tree into the random pool and
  the reserved-memory binding which isn't fully baked yet.  Expect the
  reserved-memory patches at least to resurface for v3.13.

  The bug fixes removes a scary but harmless warning on SPARC that was
  introduced in the v3.12 merge window.  v3.13 will contain a proper fix
  that makes the new code work on SPARC.

  On the plus side, the diffstat looks *awesome*.  I love removing lines
  of code"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  Revert "drivers: of: add initialization code for dma reserved memory"
  Revert "ARM: init: add support for reserved memory defined by device tree"
  Revert "of: Feed entire flattened device tree into the random pool"
  of: fix unnecessary warning on missing /cpus node

34ec4de4

Merge branch 'fixes-for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping · ba0a062e

Linus Torvalds authored Oct 15, 2013

Pull DMA-mapping fix from Marek Szyprowski:
 "A bugfix for the IOMMU-based implementation of dma-mapping subsystem
  for ARM architecture"

* 'fixes-for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: dma-mapping: Always pass proper prot flags to iommu_map()

ba0a062e

15 Oct, 2013 8 commits

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · b83aea88

Linus Torvalds authored Oct 15, 2013

Pull kvm fix from Gleb Natapov.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Enable pvspinlock after jump_label_init() to avoid VM hang

b83aea88

Merge tag 'stable/for-linus-3.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 36704263

Linus Torvalds authored Oct 15, 2013

Pull Xen fixes from Stefano Stabellini:
 "A small fix for Xen on x86_32 and a build fix for xen-tpmfront on
  arm64"

* tag 'stable/for-linus-3.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix possible user space selector corruption
  tpm: xen-tpmfront: fix missing declaration of xen_domain

36704263

KVM: Enable pvspinlock after jump_label_init() to avoid VM hang · 3dbef3e3

Raghavendra K T authored Oct 09, 2013

We use jump label to enable pv-spinlock. With the changes in (442e0973
Merge branch 'x86/jumplabel'), the jump label behaviour has changed
that would result in eventual hang of the VM since we would end up in a
situation where slow path locks would halt the vcpus but we will not be
able to wakeup the vcpu by lock releaser using unlock kick.

Similar problem in Xen and more detailed description is available in
a945928e (xen: Do not enable spinlocks before jump_label_init()
has executed)

This patch splits kvm_spinlock_init to separate jump label changes with
pvops patching and also make jump label enabling after jump_label_init().
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

3dbef3e3

KVM: Drop FOLL_GET in GUP when doing async page fault · f2e10669

chai wen authored Oct 14, 2013

Page pinning is not mandatory in kvm async page fault processing since
after async page fault event is delivered to a guest it accesses page once
again and does its own GUP.  Drop the FOLL_GET flag in GUP in async_pf
code, and do some simplifying in check/clear processing.
Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

f2e10669

Revert "drivers: of: add initialization code for dma reserved memory" · 1931ee14

Marek Szyprowski authored Oct 11, 2013

This reverts commit 9d8eab7a. There is
still no consensus on the bindings for the reserved memory and various
drawbacks of the proposed solution has been shown, so the best now is to
revert it completely and start again from scratch later.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>

1931ee14

Revert "ARM: init: add support for reserved memory defined by device tree" · cebf3e40

Marek Szyprowski authored Oct 11, 2013

This reverts commit 10bcdfb8. There is
no consensus on the bindings for the reserved memory, so the code for
handing it will be reverted.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>

cebf3e40

Merge tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfio · 1e52db69

Linus Torvalds authored Oct 14, 2013

Pull vfio fix from Alex Williamson:
 "Fix an incorrect break out of nested loop in iommu mapping code"

* tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfio:
  VFIO: vfio_iommu_type1: fix bug caused by break in nested loop

1e52db69

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · ed8ada39

Linus Torvalds authored Oct 14, 2013

Pull infiniband updates from Roland Dreier:
 "Last batch of IB changes for 3.12: many mlx5 hardware driver fixes
  plus one trivial semicolon cleanup"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Remove unnecessary semicolons
  IB/mlx5: Ensure proper synchronization accessing memory
  IB/mlx5: Fix alignment of reg umr gather buffers
  IB/mlx5: Fix eq names to display nicely in /proc/interrupts
  mlx5: Fix error code translation from firmware to driver
  IB/mlx5: Fix opt param mask according to firmware spec
  mlx5: Fix opt param mask for sq err to rts transition
  IB/mlx5: Disable atomic operations
  mlx5: Fix layout of struct mlx5_init_seg
  mlx5: Keep polling to reclaim pages while any returned
  IB/mlx5: Avoid async events on invalid port number
  IB/mlx5: Decrease memory consumption of mr caches
  mlx5: Remove checksum on command interface commands
  IB/mlx5: Fix memory leak in mlx5_ib_create_srq
  IB/mlx5: Flush cache workqueue before destroying it
  IB/mlx5: Fix send work queue size calculation

ed8ada39

14 Oct, 2013 1 commit
- Merge branch 'misc' into for-next · 59b5b28d
  Roland Dreier authored Oct 14, 2013
  
  59b5b28d