Commits · 460df4c1fc7c00829050c08d6368dc6e6beef307 · nexedi / linux

17 Feb, 2017 1 commit

KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1

Paolo Bonzini authored Feb 08, 2017

The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:

          VCPU thread                     service thread
   --------------------------------------------------------------
        check flag
                                          set flag
                                          raise signal
        (signal handler does nothing)
        KVM_RUN

However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).

As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:

          VCPU thread                     service thread
   --------------------------------------------------------------
                                          raise signal
        signal handler
          set run->immediate_exit
        KVM_RUN
          check run->immediate_exit
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

460df4c1

16 Feb, 2017 5 commits

KVM: Support vCPU-based gfn->hva cache · bbd64115

Cao, Lei authored Feb 03, 2017

Provide versions of struct gfn_to_hva_cache functions that
take vcpu as a parameter instead of struct kvm.  The existing functions
are not needed anymore, so delete them.  This allows dirty pages to
be logged in the vcpu dirty ring, instead of the global dirty ring,
for ring-based dirty memory tracking.
Signed-off-by: Lei Cao <lei.cao@stratus.com>
Message-Id: <CY1PR08MB19929BD2AC47A291FD680E83F04F0@CY1PR08MB1992.namprd08.prod.outlook.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bbd64115

KVM: use separate generations for each address space · 4bd518f1

Paolo Bonzini authored Feb 03, 2017

This will make it easier to support multiple address spaces in
kvm_gfn_to_hva_cache_init.  Instead of having to check the address
space id, we can keep on checking just the generation number.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4bd518f1

KVM: only retrieve memslots once when initializing cache · 5a2d4365

Paolo Bonzini authored Feb 03, 2017

This will make it a bit simpler to handle multiple address spaces
in gfn_to_hva_cache.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5a2d4365

KVM: VMX: use vmcs_set/clear_bits for CPU-based execution controls · 47c0152e
Paolo Bonzini authored Dec 19, 2016
```
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
47c0152e

ptp_kvm: try to detect hypercall availability · 4306f200

Radim Krčmář authored Feb 15, 2017

No point in registering the device if it cannot work.
The hypercall does not advertise itself, so we have to call it.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4306f200

15 Feb, 2017 15 commits

KVM: svm: inititalize hash table structures directly · 681bcea8

David Hildenbrand authored Jan 24, 2017

The hashtable and guarding spinlock are global data structures,
we can inititalize them statically.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170124212116.4568-1-david@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

681bcea8

kvm: nVMX: Refactor nested_vmx_run() · 858e25c0

Jim Mattson authored Nov 30, 2016

Nested_vmx_run is split into two parts: the part that handles the
VMLAUNCH/VMRESUME instruction, and the part that modifies the vcpu state
to transition from VMX root mode to VMX non-root mode. The latter will
be used when restoring the checkpointed state of a vCPU that was in VMX
operation when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

858e25c0

kvm: nVMX: Split VMCS checks from nested_vmx_run() · ca0bde28

Jim Mattson authored Nov 30, 2016

The checks performed on the contents of the vmcs12 are extracted from
nested_vmx_run so that they can be used to validate a vmcs12 that has
been restored from a checkpoint.
Signed-off-by: Jim Mattson <jmattson@google.com>
[Change prepare_vmcs02 and nested_vmx_load_cr3's last argument to u32,
 to match check_vmentry_postreqs.  Update comments for singlestep
 handling. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ca0bde28

kvm: nVMX: Refactor nested_get_vmcs12_pages() · 6beb7bd5

Jim Mattson authored Nov 30, 2016

Perform the checks on vmcs12 state early, but defer the gpa->hpa lookups
until after prepare_vmcs02. Later, when we restore the checkpointed
state of a vCPU in guest mode, we will not be able to do the gpa->hpa
lookups when the restore is done.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6beb7bd5

kvm: nVMX: Refactor handle_vmptrld() · a8bc284e

Jim Mattson authored Nov 30, 2016

Handle_vmptrld is split into two parts: the part that handles the
VMPTRLD instruction, and the part that establishes the current VMCS
pointer. The latter will be used when restoring the checkpointed state
of a vCPU that had a valid VMCS pointer when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a8bc284e

kvm: nVMX: Refactor handle_vmon() · e29acc55

Jim Mattson authored Nov 30, 2016

Handle_vmon is split into two parts: the part that handles the VMXON
instruction, and the part that modifies the vcpu state to transition
from legacy mode to VMX operation. The latter will be used when
restoring the checkpointed state of a vCPU that was in VMX operation
when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e29acc55

kvm: nVMX: Prepare for checkpointing L2 state · cf8b84f4

Jim Mattson authored Nov 30, 2016

Split prepare_vmcs12 into two parts: the part that stores the current L2
guest state and the part that sets up the exit information fields. The
former will be used when checkpointing the vCPU's VMX state.

Modify prepare_vmcs02 so that it can construct a vmcs02 midway through
L2 execution, using the checkpointed L2 guest state saved into the
cached vmcs12 above.
Signed-off-by: Jim Mattson <jmattson@google.com>
[Rebasing: add from_vmentry argument to prepare_vmcs02 instead of using
 vmx->nested.nested_run_pending, because it is no longer 1 at the
 point prepare_vmcs02 is called. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cf8b84f4

kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection · b95234c8

Paolo Bonzini authored Dec 19, 2016

Since bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.

Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.

However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.

Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.

Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.

This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b95234c8

KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5

Paolo Bonzini authored Dec 19, 2016

Calls to apic_find_highest_irr are scanning IRR twice, once
in vmx_sync_pir_from_irr and once in apic_search_irr. Change
sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
now that it does the computation, it can also do the RVI write.

In order to avoid complications in svm.c, make the callback optional.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

76dfafd5

KVM: vmx: move sync_pir_to_irr from apic_find_highest_irr to callers · 3d92789f
Paolo Bonzini authored Dec 19, 2016
```
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
3d92789f

KVM: x86: preparatory changes for APICv cleanups · 810e6def

Paolo Bonzini authored Dec 19, 2016

Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
Move vmx_sync_pir_to_irr around.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

810e6def

kvm: nVMX: move nested events check to kvm_vcpu_running · 0ad3bed6

Paolo Bonzini authored Dec 19, 2016

vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
and the former does not call check_nested_events.

Once KVM_REQ_EVENT is removed from the APICv interrupt injection
path, however, this would leave no place to trigger a vmexit
from L2 to L1, causing a missed interrupt delivery while in guest
mode.  This is caught by the "ack interrupt on exit" test in
vmx.flat.

[This does not change the calls to check_nested_events in
 inject_pending_event.  That is material for a separate cleanup.]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0ad3bed6

KVM: vmx: clear pending interrupts on KVM_SET_LAPIC · 967235d3

Paolo Bonzini authored Dec 19, 2016

Pending interrupts might be in the PI descriptor when the
LAPIC is restored from an external state; we do not want
them to be injected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

967235d3

kvm: vmx: Use the hardware provided GPA instead of page walk · db1c056c

Paolo Bonzini authored Dec 08, 2016

As in the SVM patch, the guest physical address is passed by
VMX to x86_emulate_instruction already, so mark the GPA as available
in vcpu->arch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

db1c056c

Merge branch 'kvm-ppc-next' of... · ee106891

Paolo Bonzini authored Feb 15, 2017

Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

This brings in two fixes for potential host crashes, from Ben
Herrenschmidt and Nick Piggin.

ee106891

09 Feb, 2017 3 commits

KVM: x86: hide KVM_HC_CLOCK_PAIRING on 32 bit · 8ef81a9a

Arnd Bergmann authored Feb 09, 2017

The newly added hypercall doesn't work on x86-32:

arch/x86/kvm/x86.c: In function 'kvm_pv_clock_pairing':
arch/x86/kvm/x86.c:6163:6: error: implicit declaration of function 'kvm_get_walltime_and_clockread';did you mean 'kvm_get_time_scale'? [-Werror=implicit-function-declaration]

This adds an #ifdef around it, matching the one around the related
functions that are also only implemented on 64-bit systems.

Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8ef81a9a

Merge tag 'kvmarm-for-4.11' of... · 2e751dfb

Paolo Bonzini authored Feb 09, 2017

Merge tag 'kvmarm-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

kvmarm updates for 4.11

- GICv3 save restore
- Cache flushing fixes
- MSI injection fix for GICv3 ITS
- Physical timer emulation support

2e751dfb

KVM: PPC: Book 3S: Fix error return in kvm_vm_ioctl_create_spapr_tce() · 5982f084

Wei Yongjun authored Feb 08, 2017

Fix to return error code -ENOMEM from the memory alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

5982f084

08 Feb, 2017 15 commits

PTP: add kvm PTP driver · a0e136d4

Marcelo Tosatti authored Jan 24, 2017

Add a driver with gettime method returning hosts realtime clock.
This allows Chrony to synchronize host and guest clocks with
high precision (see results below).

chronyc> sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================

To configure Chronyd to use PHC refclock, add the
following line to its configuration file:

refclock PHC /dev/ptpX poll 3 dpoll -2 offset 0

Where /dev/ptpX is the kvmclock PTP clock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a0e136d4

kvmclock: export kvmclock clocksource and data pointers · f4066c2b

Marcelo Tosatti authored Jan 24, 2017

To be used by KVM PTP driver.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f4066c2b

KVM: arm/arm64: Emulate the EL1 phys timer registers · 7b6b4631

Jintack Lim authored Feb 03, 2017

Emulate read and write operations to CNTP_TVAL, CNTP_CVAL and CNTP_CTL.
Now VMs are able to use the EL1 physical timer.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

7b6b4631

KVM: arm64: Add the EL1 physical timer access handler · c9a3c58f

Jintack Lim authored Feb 03, 2017

KVM traps on the EL1 phys timer accesses from VMs, but it doesn't handle
those traps. This results in terminating VMs. Instead, set a handler for
the EL1 phys timer access, and inject an undefined exception as an
intermediate step.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

c9a3c58f

KVM: arm/arm64: Set up a background timer for the physical timer emulation · f242adaf

Jintack Lim authored Feb 03, 2017

Set a background timer for the EL1 physical timer emulation while VMs
are running, so that VMs get the physical timer interrupts in a timely
manner.

Schedule the background timer on entry to the VM and cancel it on exit.
This would not have any performance impact to the guest OSes that
currently use the virtual timer since the physical timer is always not
enabled.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

f242adaf

KVM: arm/arm64: Set a background timer to the earliest timer expiration · fb280e97

Jintack Lim authored Feb 03, 2017

When scheduling a background timer, consider both of the virtual and
physical timer and pick the earliest expiration time.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

fb280e97

KVM: arm/arm64: Update the physical timer interrupt level · 58e0c973

Jintack Lim authored Feb 03, 2017

Now that we maintain the EL1 physical timer register states of VMs,
update the physical timer interrupt level along with the virtual one.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

58e0c973

KVM: arm/arm64: Initialize the emulated EL1 physical timer · a91d1855

Jintack Lim authored Feb 03, 2017

Initialize the emulated EL1 physical timer with the default irq number.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

a91d1855

KVM: arm/arm64: Add the EL1 physical timer context · 009a5701

Jintack Lim authored Feb 03, 2017

Add the EL1 physical timer context.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

009a5701

KVM: arm/arm64: Decouple kvm timer functions from virtual timer · 9171fa2e

Jintack Lim authored Feb 03, 2017

Now that we have a separate structure for timer context, make functions
generic so that they can work with any timer context, not just the
virtual timer context.  This does not change the virtual timer
functionality.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

9171fa2e

KVM: arm/arm64: Move cntvoff to each timer context · 90de943a

Jintack Lim authored Feb 03, 2017

Make cntvoff per each timer context. This is helpful to abstract kvm
timer functions to work with timer context without considering timer
types (e.g. physical timer or virtual timer).

This also would pave the way for ever doing adjustments of the cntvoff
on a per-CPU basis if that should ever make sense.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

90de943a

KVM: arm/arm64: Abstract virtual timer context into separate structure · fbb4aeec

Jintack Lim authored Feb 03, 2017

Abstract virtual timer context into a separate structure and change all
callers referring to timer registers, irq state and so on. No change in
functionality.

This is about to become very handy when adding the EL1 physical timer.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

fbb4aeec

KVM: arm/arm64: vgic: Stop injecting the MSI occurrence twice · 0bdbf3b0

Shanker Donthineni authored Feb 02, 2017

The IRQFD framework calls the architecture dependent function
twice if the corresponding GSI type is edge triggered. For ARM,
the function kvm_set_msi() is getting called twice whenever the
IRQFD receives the event signal. The rest of the code path is
trying to inject the MSI without any validation checks. No need
to call the function vgic_its_inject_msi() second time to avoid
an unnecessary overhead in IRQ queue logic. It also avoids the
possibility of VM seeing the MSI twice.

Simple fix, return -1 if the argument 'level' value is zero.

Cc: stable@vger.kernel.org
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

0bdbf3b0

KVM: x86: fix compilation · 80fbd89c

Paolo Bonzini authored Feb 08, 2017

Fix rebase breakage from commit 55dd00a7 ("KVM: x86: add
KVM_HC_CLOCK_PAIRING hypercall", 2017-01-24), courtesy of the
"I could have sworn I had pushed the right branch" department.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

80fbd89c

Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · a4a741a0

Paul Mackerras authored Feb 08, 2017

This merges in a fix which touches both PPC and KVM code,
which was therefore put into a topic branch in the powerpc
tree.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

a4a741a0

07 Feb, 2017 1 commit

Merge tag 'kvm-s390-next-4.11-2' of... · 8f00067a

Paolo Bonzini authored Feb 07, 2017

Merge tag 'kvm-s390-next-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and features for 4.11 (via kvm/next)

- enable some simd extensions for guests
- enable nx for guests
- debug log for cpu model
- PER fixes
- remove bitwise annotation from ar_t
- detect guests in operation exception program check loops
- fix potential null-pointer dereference for ucontrol guests

- also contains merge for fix that went into 4.10 to avoid conflicts

8f00067a