Commits · e30492bbe95a2495930aa7db7eacde5141e45332 · nexedi / linux

30 May, 2014 20 commits

MIPS: KVM: Rewrite count/compare timer emulation · e30492bb

James Hogan authored May 29, 2014

Previously the emulation of the CPU timer was just enough to get a Linux
guest running but some shortcuts were taken:
 - The guest timer interrupt was hard coded to always happen every 10 ms
   rather than being timed to when CP0_Count would match CP0_Compare.
 - The guest's CP0_Count register was based on the host's CP0_Count
   register. This isn't very portable and fails on cores without a
   CP_Count register implemented such as Ingenic XBurst. It also meant
   that the guest's CP0_Cause.DC bit to disable the CP0_Count register
   took no effect.
 - The guest's CP0_Count register was emulated by just dividing the
   host's CP0_Count register by 4. This resulted in continuity problems
   when used as a clock source, since when the host CP0_Count overflows
   from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
   discontinuously from 0x1fffffff to 0xe0000000.

Therefore rewrite & fix emulation of the guest timer based on the
monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
value is added to the frequency scaled nanosecond monotonic time to get
the guest's CP0_Count. The frequency of the timer is initialised to
100MHz and cannot yet be changed, but a later patch will allow the
frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
interface.

The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
via the KVM_SET_ONE_REG ioctl interface), at which point the current
CP0_Count is stored and can be read directly. When it is restarted the
bias is recalculated such that the CP0_Count value is continuous.

Due to the nature of hrtimer interrupts any read of the guest's
CP0_Count register while it is running triggers a check for whether the
hrtimer has expired, so that the guest/userland cannot observe the
CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
also taken advantage of when stopping the timer to ensure that a pending
timer interrupt is queued.

This replaces the implementation of:
 - Guest read of CP0_Count
 - Guest write of CP0_Count
 - Guest write of CP0_Compare
 - Guest write of CP0_Cause
 - Guest read of HWR 2 (CC) with RDHWR
 - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
 - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
 - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
 - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e30492bb

MIPS: KVM: Migrate hrtimer to follow VCPU · 3a0ba774

James Hogan authored May 29, 2014

When a VCPU is scheduled in on a different CPU, refresh the hrtimer used
for emulating count/compare so that it gets migrated to the same CPU.

This should prevent a timer interrupt occurring on a different CPU to
where the guest it relates to is running, which would cause the guest
timer interrupt not to be delivered until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3a0ba774

MIPS: KVM: Fix timer race modifying guest CP0_Cause · c73c99b0

James Hogan authored May 29, 2014

The hrtimer callback for guest timer timeouts sets the guest's
CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
pending, however there is no mutual exclusion implemented to prevent
this occurring while the guest's CP0_Cause register is being
read-modify-written elsewhere.

When this occurs the setting of the CP0_Cause.TI bit is undone and the
guest misses the timer interrupt and doesn't reprogram the CP0_Compare
register for the next timeout. Currently another timer interrupt will be
triggered again in another 10ms anyway due to the way timers are
emulated, but after the MIPS timer emulation is fixed this would result
in Linux guest time standing still and the guest scheduler not being
invoked until the guest CP0_Count has looped around again, which at
100MHz takes just under 43 seconds.

Currently this is the only asynchronous modification of guest registers,
therefore it is fixed by adjusting the implementations of the
kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
kvm_change_c0_guest_cause() macros which are used for modifying the
guest CP0_Cause register to use ll/sc to ensure atomic modification.
This should work in both UP and SMP cases without requiring interrupts
to be disabled.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c73c99b0

MIPS: KVM: Deliver guest interrupts after local_irq_disable() · 044f0f03

James Hogan authored May 29, 2014

When about to run the guest, deliver guest interrupts after disabling
host interrupts. This should prevent an hrtimer interrupt from being
handled after delivering guest interrupts, and therefore not delivering
the guest timer interrupt until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

044f0f03

MIPS: KVM: Add CP0_HWREna KVM register access · 16fd5c1d

James Hogan authored May 29, 2014

Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
HWREna register. This is so that userland can save and restore its
value so that RDHWR instructions don't have to be emulated by the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

16fd5c1d

MIPS: KVM: Add CP0_UserLocal KVM register access · 7767b7d2

James Hogan authored May 29, 2014

Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
UserLocal register. This is so that userland can save and restore its
value.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7767b7d2

MIPS: KVM: Add CP0_Count/Compare KVM register access · f8be02da

James Hogan authored May 29, 2014

Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
Count and Compare registers. These registers are special in that writing
to them has side effects (adjusting the time until the next timer
interrupt) and reading of Count depends on the time. Therefore add a
couple of callbacks so that different implementations (trap & emulate or
VZ) can implement them differently depending on what the hardware
provides.

The trap & emulate versions mostly duplicate what happens when a T&E
guest reads or writes these registers, so it inherits the same
limitations which can be fixed in later patches.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f8be02da

MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.h · 48a3c4e4

James Hogan authored May 29, 2014

Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
kvm_mips.c to kvm_host.h so that they can be shared between multiple
source files. This allows register access to be indirected depending on
the underlying implementation (trap & emulate or VZ).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

48a3c4e4

MIPS: KVM: Add CP0_EPC KVM register access · fb6df0cd

James Hogan authored May 29, 2014

Contrary to the comment, the guest CP0_EPC register cannot be set via
kvm_regs, since it is distinct from the guest PC. Add the EPC register
to the KVM_{GET,SET}_ONE_REG ioctl interface.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fb6df0cd

MIPS: KVM: Use tlb_write_random · b5dfc6c1

James Hogan authored May 29, 2014

When MIPS KVM needs to write a TLB entry for the guest it reads the
CP0_Random register, uses it to generate the CP_Index, and writes the
TLB entry using the TLBWI instruction (tlb_write_indexed()).

However there's an instruction for that, TLBWR (tlb_write_random()) so
use that instead.

This happens to also fix an issue with Ingenic XBurst cores where the
same TLB entry is replaced each time preventing forward progress on
stores due to alternating between TLB load misses for the instruction
fetch and TLB store misses.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b5dfc6c1

MIPS: KVM: Use local_flush_icache_range to fix RI on XBurst · facaaec1

James Hogan authored May 29, 2014

MIPS KVM uses mips32_SyncICache to synchronise the icache with the
dcache after dynamically modifying guest instructions or writing guest
exception vector. However this uses rdhwr to get the SYNCI step, which
causes a reserved instruction exception on Ingenic XBurst cores.

It would seem to make more sense to use local_flush_icache_range()
instead which does the same thing but is more portable.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

facaaec1

MIPS: Export local_flush_icache_range for KVM · 90f91356

James Hogan authored May 29, 2014

Export the local_flush_icache_range function pointer for GPL modules so
that it can be used by KVM for syncing the icache after binary
translation of trapping instructions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

90f91356

MIPS: KVM: Allocate at least 16KB for exception handlers · 7006e2df

James Hogan authored May 29, 2014

Each MIPS KVM guest has its own copy of the KVM exception vector. This
contains the TLB refill exception handler at offset 0x000, the general
exception handler at offset 0x180, and interrupt exception handlers at
offset 0x200 in case Cause_IV=1. A common handler is copied to offset
0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
from guest.

However the amount of memory allocated for this purpose is calculated as
0x200 rounded up to the next page boundary, which is insufficient if 4KB
pages are in use. This can lead to the common handler at offset 0x2000
being overwritten and infinitely recursive exceptions on the next exit
from the guest.

Increase the minimum size from 0x200 to 0x4000 to cover the full use of
the page.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7006e2df

Merge tag 'kvm-s390-20140530' of... · 146b2cfe

Paolo Bonzini authored May 30, 2014

Merge tag 'kvm-s390-20140530' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Several minor fixes and cleanups for KVM:
2. Fix flag check for gdb support
3. Remove unnecessary vcpu start
4. Remove code duplication for sigp interrupts
5. Better DAT handling for the TPROT instruction
6. Correct addressing exception for standby memory

146b2cfe

KVM: s390: Intercept the tprot instruction · 5a5e6536

Matthew Rosato authored Jan 29, 2013

Based on original patch from Jeng-fang (Nick) Wang

When standby memory is specified for a guest Linux, but no virtual memory has
been allocated on the Qemu host backing that guest, the guest memory detection
process encounters a memory access exception which is not thrown from the KVM
handle_tprot() instruction-handler function. The access exception comes from
sie64a returning EFAULT, which then passes an addressing exception to the guest.
Unfortunately this does not the proper PSW fixup (nullifying vs.
suppressing) so the guest will get a fault for the wrong address.

Let's just intercept the tprot instruction all the time to do the right thing
and not go the page fault handler path for standby memory. tprot is only used
by Linux during startup so some exits should be ok.
Without this patch, standby memory cannot be used with KVM.
Signed-off-by: Nick Wang <jfwang@us.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

5a5e6536

KVM: s390: a VCPU is already started when delivering interrupts · 3192c639

David Hildenbrand authored May 06, 2014

This patch removes the start of a VCPU when delivering a RESTART interrupt.
Interrupt delivery is called from kvm_arch_vcpu_ioctl_run. So the VCPU is
already considered started - no need to call kvm_s390_vcpu_start. This function
will early exit anyway.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

3192c639

KVM: s390: check the given debug flags, not the set ones · 2de3bfc2

David Hildenbrand authored May 20, 2014

This patch fixes a minor bug when updating the guest debug settings.
We should check the given debug flags, not the already set ones.
Doesn't do any harm but too many (for now unused) flags could be set internally
without error.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

2de3bfc2

KVM: s390: clean up interrupt injection in sigp code · 22ff4a33

Jens Freimann authored May 14, 2014

We have all the logic to inject interrupts available in
kvm_s390_inject_vcpu(), so let's use it instead of
injecting irqs manually to the list in sigp code.

SIGP stop is special because we have to check the
action_flags before injecting the interrupt. As
the action_flags are not available in kvm_s390_inject_vcpu()
we leave the code for the stop order code untouched for now.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

22ff4a33

KVM: s390: Enable DAT support for TPROT handler · a0465f9a

Thomas Huth authored Feb 04, 2014

The TPROT instruction can be used to check the accessability of storage
for any kind of logical addresses. So far, our handler only supported
real addresses. This patch now also enables support for addresses that
have to be translated via DAT first. And while we're at it, change the
code to use the common KVM function gfn_to_hva_prot() to check for the
validity and writability of the memory page.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

a0465f9a

KVM: s390: Add a generic function for translating guest addresses · 9fbc0276

Thomas Huth authored Feb 04, 2014

This patch adds a function for translating logical guest addresses into
physical guest addresses without touching the memory at the given location.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

9fbc0276

29 May, 2014 1 commit

MIPS: KVM: remove the stale memory alias support function unalias_gfn · 356d4c20

Deng-Cheng Zhu authored May 29, 2014

The memory alias support has been removed since a1f4d395 (KVM: Remove
memory alias support). So remove unalias_gfn from the MIPS port.
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

356d4c20

27 May, 2014 4 commits

arm: Fix compile warning for psci · d6d7a95c

Christoffer Dall authored May 27, 2014

Commit e71246a2 changes psci_init from a
function returning a void to an int, but does not change the non
CONFIG_ARM_PSCI implementation to return a value, which causes a compile
warning.  Just return 0.

Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Cc: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d6d7a95c

Merge tag 'kvm-arm-for-3.16' of... · 04092204

Paolo Bonzini authored May 27, 2014

Merge tag 'kvm-arm-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next

Changed for the 3.16 merge window.

This includes KVM support for PSCI v0.2 and also includes generic Linux
support for PSCI v0.2 (on hosts that advertise that feature via their
DT), since the latter depends on headers introduced by the former.

Finally there's a small patch from Marc that enables Cortex-A53 support.

04092204

KVM: x86: MOV CR/DR emulation should ignore mod · 9b88ae99

Nadav Amit authored May 25, 2014

MOV CR/DR instructions ignore the mod field (in the ModR/M byte). As the SDM
states: "The 2 bits in the mod field are ignored". Accordingly, the second
operand of these instructions is always a general purpose register.

The current emulator implementation does not do so. If the mod bits do not
equal 3, it expects the second operand to be in memory.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9b88ae99

KVM: lapic: sync highest ISR to hardware apic on EOI · fc57ac2c

Paolo Bonzini authored May 14, 2014

When Hyper-V enlightenments are in effect, Windows prefers to issue an
Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
The Hyper-V MSR write is not handled by the processor, and besides
being slower, this also causes bugs with APIC virtualization.  The
reason is that on EOI the processor will modify the highest in-service
interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
the SDM; every other step in EOI virtualization is already done by
apic_send_eoi or on VM entry, but this one is missing.

We need to do the same, and be careful not to muck with the isr_count
and highest_isr_cache fields that are unused when virtual interrupt
delivery is enabled.

Cc: stable@vger.kernel.org
Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fc57ac2c

25 May, 2014 1 commit

arm64: KVM: Enable minimalistic support for Cortex-A53 · 1252b331

Marc Zyngier authored May 20, 2014

In order to allow KVM to run on Cortex-A53 implementations, wire the
minimal support required.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

1252b331

22 May, 2014 6 commits

KVM: vmx: DR7 masking on task switch emulation is wrong · 1f854112

Nadav Amit authored May 19, 2014

The DR7 masking which is done on task switch emulation should be in hex format
(clearing the local breakpoints enable bits 0,2,4 and 6).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1f854112

x86: fix page fault tracing when KVM guest support enabled · 65a7f03f

Dave Hansen authored May 16, 2014

I noticed on some of my systems that page fault tracing doesn't
work:

	cd /sys/kernel/debug/tracing
	echo 1 > events/exceptions/enable
	cat trace;
	# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:

> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
>         enum ctx_state prev_state;
>
>         switch (kvm_read_and_reset_pf_reason()) {
>         default:
>                 do_page_fault(regs, error_code);
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:

I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

65a7f03f

KVM: x86: get CPL from SS.DPL · ae9fedc7

Paolo Bonzini authored May 14, 2014

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ae9fedc7

KVM: x86: check CS.DPL against RPL during task switch · 5045b468

Paolo Bonzini authored May 15, 2014

Table 7-1 of the SDM mentions a check that the code segment's
DPL must match the selector's RPL.  This was not done by KVM,
fix it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5045b468

KVM: x86: drop set_rflags callback · fb5e336b

Paolo Bonzini authored May 15, 2014

Not needed anymore now that the CPL is computed directly
during task switch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fb5e336b

KVM: x86: use new CS.RPL as CPL during task switch · 2356aaeb

Paolo Bonzini authored May 15, 2014

During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
to all the other requirements) and will be the new CPL. So far this
worked by carefully setting the CS selector and flag before doing the
task switch; setting CS.selector will already change the CPL.

However, this will not work once we get the CPL from SS.DPL, because
then you will have to set the full segment descriptor cache to change
the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the
task switch, and the check that SS.DPL == CPL will fail.

Temporarily assume that the CPL comes from CS.RPL during task switch
to a protected-mode task. This is the same approach used in QEMU's
emulation code, which (until version 2.0) manually tracks the CPL.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2356aaeb

16 May, 2014 8 commits

Merge tag 'kvm-s390-20140516' of... · afa538f0

Paolo Bonzini authored May 16, 2014

Merge tag 'kvm-s390-20140516' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Correct locking for lazy storage key handling
   A test loop with multiple CPUs triggered a race in the lazy storage
   key handling as introduced by commit 934bc131
   (KVM: s390: Allow skeys to be enabled for the current process). This
   race should not happen with Linux guests, but let's fix it anyway.
   Patch touches !/kvm/ code, but is from the s390 maintainer.

2. Better handling of broken guests
   If we detect a program check loop we stop the guest instead of
   wasting CPU cycles.

3. Better handling on MVPG emulation
   The move page handling is improved to be architecturally correct.

3. Trace point rework
   Let's rework the kvm trace points to have a common header file (for
   later perf usage) and provided a table based instruction decoder.

4. Interpretive execution of SIGP external call
   Let the hardware handle most cases of SIGP external call (IPI) and
   wire up the fixup code for the corner cases.

5. Initial preparations for the IBC facility
   Prepare the code to handle instruction blocking

afa538f0

KVM: s390: split SIE state guest prefix field · fda902cb

Michael Mueller authored May 13, 2014

This patch splits the SIE state guest prefix at offset 4
into a prefix bit field. Additionally it provides the
access functions:

 - kvm_s390_get_prefix()
 - kvm_s390_set_prefix()

to access the prefix per vcpu.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

fda902cb

s390/sclp: add sclp_get_ibc function · 570126d3

Michael Mueller authored Mar 15, 2014

The patch adds functionality to retrieve the IBC configuration
by means of function sclp_get_ibc().
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

570126d3

KVM: s390: interpretive execution of SIGP EXTERNAL CALL · 4953919f

David Hildenbrand authored Feb 21, 2014

If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL
operations will be interpreted instead of intercepted. A partial execution
interception will occurr at the sending cpu only if the target cpu is in the
wait state ("W" bit in the cpuflags set). Instruction interception will only
happen in error cases (e.g. cpu addr invalid).

As a sending cpu might set the external call interrupt pending flags at the
target cpu at every point in time, we can't handle this kind of interrupt using
our kvm interrupt injection mechanism. The injection will be done automatically
by the SIE when preparing the start of the target cpu.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Thomas Huth <thuth@linux.vnet.ibm.com>
[Adopt external call injection to check for sigp interpretion]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

4953919f

KVM: s390: Use intercept_insn decoder in trace event · d26b8655

Alexander Yarygin authored Jan 30, 2014

The current trace definition doesn't work very well with the perf tool.
Perf shows a "insn_to_mnemonic not found" message. Let's handle the
decoding completely in a parseable format.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

d26b8655

KVM: s390: decoder of SIE intercepted instructions · 05db1f6e

Alexander Yarygin authored Jan 30, 2014

This patch adds a new decoder of SIE intercepted instructions.

The decoder implemented as a macro and potentially can be used in
both kernelspace and userspace.

Note that this simplified instruction decoder is only intended to be
used with the subset of instructions that may cause a SIE intercept.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

05db1f6e

KVM: s390: Use trace tables from sie.h. · 6de1bf88

Alexander Yarygin authored Jan 30, 2014

Use the symbolic translation tables from sie.h for decoding diag, sigp
and sie exit codes.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

6de1bf88

KVM: s390: add sie exit reasons tables · ceae283b

Alexander Yarygin authored Jan 30, 2014

This patch defines tables of reasons for exiting from SIE mode
in a new sie.h header file. Tables contain SIE intercepted codes,
intercepted instructions and program interruptions codes.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

ceae283b