Commits · 6ef1522f7ecc063317dfb5ca63da6e47130a4c50 · Kirill Smelkov / linux

01 Oct, 2015 40 commits

KVM: Extend struct pi_desc for VT-d Posted-Interrupts · 6ef1522f

Feng Wu authored Sep 18, 2015

Extend struct pi_desc for VT-d Posted-Interrupts.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6ef1522f

KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' · f70c20aa

Feng Wu authored Sep 18, 2015

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f70c20aa

KVM: eventfd: add irq bypass consumer management · 9016cfb5

Eric Auger authored Sep 18, 2015

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9016cfb5

KVM: introduce kvm_arch functions for IRQ bypass · 1a02b270

Eric Auger authored Sep 18, 2015

This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
[Add weak implementations of the callbacks. - Feng]
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1a02b270

KVM: create kvm_irqfd.h · 166c9775

Eric Auger authored Sep 18, 2015

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

166c9775

virt: Add virt directory to the top Makefile · 37d9fe47

Feng Wu authored Sep 22, 2015

We need to build files in virt/lib/, which are now used by
KVM and VFIO, so add virt directory to the top Makefile.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

37d9fe47

virt: IRQ bypass manager · f73f8173

Alex Williamson authored Sep 18, 2015

When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios.  Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running.  ARM IRQ Forwarding allows forwarded physical interrupts to
be directly deactivated by the guest.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism.  To do this, we base the connection on a
shared, opaque token.  For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path.  When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f73f8173

irq_remapping: move structs outside #ifdef · 18cd52c4

Paolo Bonzini authored Sep 18, 2015

This is friendlier to clients of the code, who are going to prepare
vcpu_data structs unconditionally, even if CONFIG_IRQ_REMAP is not
defined.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

18cd52c4

x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO · 72c930dc

Radim Krčmář authored Sep 18, 2015

Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore.
The purpose of that flags was to start counting system time from 0 when
the KVM clock has been initialized.
We can achieve the same by selecting one read as the initial point.

A simple subtraction will work unless the KVM clock count overflows
earlier (has smaller width) than scheduler's cycle count.  We should be
safe till x86_128.

Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors,
setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might
regress on older ones.

I presume we don't need to change kvm_clock_read instead of introducing
kvm_sched_clock_read.  A problem could arise in case sched_clock is
expected to return the same value as get_cycles, but we should have
merged those clocks in that case.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

72c930dc

KVM: VMX: drop rdtscp_enabled field · 1cea0ce6

Xiao Guangrong authored Sep 09, 2015

Check cpuid bit instead of it
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1cea0ce6

KVM: VMX: clean up bit operation on SECONDARY_VM_EXEC_CONTROL · 7ec36296

Xiao Guangrong authored Sep 09, 2015

Use vmcs_set_bits() and vmcs_clear_bits() to clean up the code
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7ec36296

KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update · feda805f

Xiao Guangrong authored Sep 09, 2015

Unify the update in vmx_cpuid_update()
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

feda805f

KVM: VMX: align vmx->nested.nested_vmx_secondary_ctls_high to vmx->rdtscp_enabled · 8b97265a
Paolo Bonzini authored Sep 15, 2015
```
The SECONDARY_EXEC_RDTSCP must be available iff RDTSCP is enabled in the
guest.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
8b97265a

KVM: VMX: simplify invpcid handling in vmx_cpuid_update() · 29541bb8

Xiao Guangrong authored Sep 09, 2015

If vmx_invpcid_supported() is true, second execution control
filed must be supported and SECONDARY_EXEC_ENABLE_INVPCID
must have already been set in current vmcs by
vmx_secondary_exec_control()

If vmx_invpcid_supported() is false, no need to clear
SECONDARY_EXEC_ENABLE_INVPCID
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

29541bb8

KVM: VMX: simplify rdtscp handling in vmx_cpuid_update() · f36201e5

Xiao Guangrong authored Sep 09, 2015

if vmx_rdtscp_supported() is true SECONDARY_EXEC_RDTSCP must
have already been set in current vmcs by
vmx_secondary_exec_control()
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f36201e5

KVM: VMX: drop rdtscp_enabled check in prepare_vmcs02() · e2821620

Xiao Guangrong authored Sep 09, 2015

SECONDARY_EXEC_RDTSCP set for L2 guest comes from vmcs12
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e2821620

KVM: x86: add pcommit support · 8b3e34e4

Xiao Guangrong authored Sep 09, 2015

Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction

Currently we do not catch pcommit instruction for L1 guest and
allow L1 to catch this instruction for L2 if, as required by the spec,
L1 can enumerate the PCOMMIT instruction via CPUID:
| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the
| 1-setting of PCOMMIT exiting) is always the same as
| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting
| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID

The spec can be found at
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdfSigned-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8b3e34e4

KVM: x86: allow guest to use cflushopt and clwb · eb1c31b4

Xiao Guangrong authored Sep 09, 2015

Pass these CPU features to guest to enable them in guest

They are needed by nvdimm drivers
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

eb1c31b4

KVM: vmx: disable posted interrupts if no local APIC · d6a858d1

Paolo Bonzini authored Sep 28, 2015

Uniprocessor 32-bit randconfigs can disable the local APIC, and posted
interrupts require reserving a vector on the LAPIC, so they are
incompatible.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d6a858d1

kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support · 9eec50b8

Andrey Smetanin authored Sep 16, 2015

HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."

Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9eec50b8

kvm/x86: Hyper-V HV_X64_MSR_VP_INDEX export for QEMU. · 11c4b1ca

Andrey Smetanin authored Sep 16, 2015

Insert Hyper-V HV_X64_MSR_VP_INDEX into msr's emulated list,
so QEMU can set Hyper-V features cpuid HV_X64_MSR_VP_INDEX_AVAILABLE
bit correctly. KVM emulation part is in place already.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

11c4b1ca

kvm/x86: Hyper-V HV_X64_MSR_RESET msr · e516cebb

Andrey Smetanin authored Sep 16, 2015

HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest
to reset guest VM by hypervisor.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e516cebb

kvm: add capability for any-length ioeventfds · e9ea5069

Jason Wang authored Sep 15, 2015

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e9ea5069

kvm: add tracepoint for fast mmio · 931c33b1

Jason Wang authored Sep 15, 2015

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

931c33b1

kvm: use kmalloc() instead of kzalloc() during iodev register/unregister · d3febddd

Jason Wang authored Aug 25, 2015

All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d3febddd

KVM: x86: Add support for local interrupt requests from userspace · 1c1a9ce9

Steve Rutherford authored Jul 30, 2015

In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.

KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.

The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).

When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.

When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.

Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.

Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.

Compiles for x86.

Can pass the KVM Unit Tests.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1c1a9ce9

KVM: x86: Add EOI exit bitmap inference · b053b2ae

Steve Rutherford authored Jul 29, 2015

In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.

If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.

The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.

This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.

Compile tested for Intel x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b053b2ae

KVM: x86: Add KVM exit for IOAPIC EOIs · 7543a635

Steve Rutherford authored Jul 29, 2015

Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
level-triggered IOAPIC interrupts.

Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
to be informed (which is identical to the EOI_EXIT_BITMAP field used
by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
exits on older processors).

[Note: A prototype using ResampleFDs found that decoupling the EOI
from the VCPU's thread made it possible for the VCPU to not see a
recent EOI after reentering the guest. This does not match real
hardware.]

Compile tested for Intel x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7543a635

KVM: x86: Split the APIC from the rest of IRQCHIP. · 49df6397

Steve Rutherford authored Jul 29, 2015

First patch in a series which enables the relocation of the
PIC/IOAPIC to userspace.

Adds capability KVM_CAP_SPLIT_IRQCHIP;

KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
rest of the irqchip.

Compile tested for x86.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Suggested-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

49df6397

KVM: x86: unify handling of interrupt window · 4ca7dd8c

Paolo Bonzini authored Jul 30, 2015

The interrupt window is currently checked twice, once in vmx.c/svm.c and
once in dm_request_for_irq_injection. The only difference is the extra
check for kvm_arch_interrupt_allowed in dm_request_for_irq_injection,
and the different return value (EINTR/KVM_EXIT_INTR for vmx.c/svm.c vs.
0/KVM_EXIT_IRQ_WINDOW_OPEN for dm_request_for_irq_injection).

However, dm_request_for_irq_injection is basically dead code! Revive it
by removing the checks in vmx.c and svm.c's vmexit handlers, and
fixing the returned values for the dm_request_for_irq_injection case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4ca7dd8c

KVM: x86: introduce lapic_in_kernel · 35754c98

Paolo Bonzini authored Jul 29, 2015

Avoid pointer chasing and memory barriers, and simplify the code
when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace)
is introduced.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

35754c98

KVM: x86: replace vm_has_apicv hook with cpu_uses_apicv · d50ab6c1

Paolo Bonzini authored Jul 29, 2015

This will avoid an unnecessary trip to ->kvm and from there to the VPIC.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d50ab6c1

KVM: x86: store IOAPIC-handled vectors in each VCPU · 3bb345f3

Paolo Bonzini authored Jul 29, 2015

We can reuse the algorithm that computes the EOI exit bitmap to figure
out which vectors are handled by the IOAPIC.  The only difference
between the two is for edge-triggered interrupts other than IRQ8
that have no notifiers active; however, the IOAPIC does not have to
do anything special for these interrupts anyway.

This again limits the interactions between the IOAPIC and the LAPIC,
making it easier to move the former to userspace.

Inspired by a patch from Steve Rutherford.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3bb345f3

KVM: x86: set TMR when the interrupt is accepted · bdaffe1d

Paolo Bonzini authored Jul 29, 2015

Do not compute TMR in advance. Instead, set the TMR just before the interrupt
is accepted into the IRR. This limits the coupling between IOAPIC and LAPIC.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bdaffe1d

tools lib traceevent: update KVM plugin · 58219c1a

Paolo Bonzini authored Oct 01, 2015

The format of the role word has changed through the years and the
plugin was never updated; some VMX exit reasons were missing too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

58219c1a

Merge branch 'x86/for-kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD · 82f6c9cd

Paolo Bonzini authored Oct 01, 2015

This merges a cleanup of asm/apic.h, which is needed by the KVM patches
to support VT-d posted interrupts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

82f6c9cd

Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS · d2922422

Dirk Müller authored Oct 01, 2015

The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).

The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug.  This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.

Cc: stable@vger.kernel.org
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d2922422

Update KVM homepage Url · 038161de

Dirk Müller authored Oct 01, 2015

The old one appears to be a generic catch all page, which
is unhelpful.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

038161de

Revert "KVM: SVM: use NPT page attributes" · fc07e76a

Paolo Bonzini authored Oct 01, 2015

This reverts commit 3c2e7f7d.
Initializing the mapping from MTRR to PAT values was reported to
fail nondeterministically, and it also caused extremely slow boot
(due to caching getting disabled---bug 103321) with assigned devices.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Sebastian Schuette <dracon@ewetel.net>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fc07e76a

Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" · bcf166a9

Paolo Bonzini authored Oct 01, 2015

This reverts commit 54928303.
It builds on the commit that is being reverted next.

Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bcf166a9