Commits · 0e60b0799fedc495a5c57dbd669de3c10d72edd2 · nexedi / linux

04 Dec, 2014 12 commits

kvm: change memslot sorting rule from size to GFN · 0e60b079

Igor Mammedov authored Dec 01, 2014

it will allow to use binary search for GFN -> memslot
lookups, reducing lookup cost with large slots amount.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0e60b079

kvm: search_memslots: add simple LRU memslot caching · d4ae84a0

Igor Mammedov authored Dec 01, 2014

In typical guest boot workload only 2-3 memslots are used
extensively, and at that it's mostly the same memslot
lookup operation.

Adding LRU cache improves average lookup time from
46 to 28 cycles (~40%) for this workload.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d4ae84a0

kvm: update_memslots: drop not needed check for the same slot · 7f379cff

Igor Mammedov authored Dec 01, 2014

UP/DOWN shift loops will shift array in needed
direction and stop at place where new slot should
be placed regardless of old slot size.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7f379cff

kvm: update_memslots: drop not needed check for the same number of pages · 5a38b6e6

Igor Mammedov authored Dec 01, 2014

if number of pages haven't changed sorting algorithm
will do nothing, so there is no need to do extra check
to avoid entering sorting logic.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5a38b6e6

KVM: x86: allow 256 logical x2APICs again · 45c3094a

Radim Krčmář authored Nov 27, 2014

While fixing an x2apic bug,
 17d68b76 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
we've made only one cluster available.  This means that the amount of
logically addressible x2APICs was reduced to 16 and VCPUs kept
overwriting themselves in that region, so even the first cluster wasn't
set up correctly.

This patch extends x2APIC support back to the logical_map's limit, and
keeps the CVE fixed as messages for non-present APICs are dropped.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

45c3094a

KVM: x86: check bounds of APIC maps · 25995e5b

Radim Krčmář authored Nov 27, 2014

They can't be violated now, but play it safe for the future.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

25995e5b

KVM: x86: fix APIC physical destination wrapping · fa834e91

Radim Krčmář authored Nov 27, 2014

x2apic allows destinations > 0xff and we don't want them delivered to
lower APICs.  They are correctly handled by doing nothing.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fa834e91

KVM: x86: deliver phys lowest-prio · 085563fb

Radim Krčmář authored Nov 27, 2014

Physical mode can't address more than one APIC, but lowest-prio is
allowed, so we just reuse our paths.

SDM 10.6.2.1 Physical Destination:
  Also, for any non-broadcast IPI or I/O subsystem initiated interrupt
  with lowest priority delivery mode, software must ensure that APICs
  defined in the interrupt address are present and enabled to receive
  interrupts.

We could warn on top of that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

085563fb

KVM: x86: don't retry hopeless APIC delivery · 698f9755

Radim Krčmář authored Nov 27, 2014

False from kvm_irq_delivery_to_apic_fast() means that we don't handle it
in the fast path, but we still return false in cases that were perfectly
handled, fix that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

698f9755

KVM: x86: use MSR_ICR instead of a number · decdc283

Radim Krčmář authored Nov 26, 2014

0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.

Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

decdc283

KVM: x86: Fix reserved x2apic registers · c69d3d9b

Nadav Amit authored Nov 26, 2014

x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC
Register Address Space"). KVM needs to cause #GP on such accesses.

Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c69d3d9b

KVM: x86: Generate #UD when memory operand is required · 39f062ff

Nadav Amit authored Nov 26, 2014

Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.

The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

39f062ff

03 Dec, 2014 1 commit

Merge tag 'kvm-s390-next-20141128' of... · be06b6be

Paolo Bonzini authored Dec 03, 2014

Merge tag 'kvm-s390-next-20141128' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Several fixes,cleanups and reworks

Here is a bunch of fixes that deal mostly with architectural compliance:
- interrupt priorities
- interrupt handling
- intruction exit handling

We also provide a helper function for getting the guest visible storage key.

be06b6be

28 Nov, 2014 11 commits

KVM: s390: allow injecting all kinds of machine checks · fc2020cf

Jens Freimann authored Aug 13, 2014

Allow to specify CR14, logout area, external damage code
and failed storage address.

Since more then one machine check can be indicated to the guest at
a time we need to combine all indication bits with already pending
requests.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

fc2020cf

KVM: s390: handle pending local interrupts via bitmap · 383d0b05

Jens Freimann authored Jul 29, 2014

This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.

* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
  PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
  one request pending from every other CPU in the system.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

383d0b05

KVM: s390: add bitmap for handling cpu-local interrupts · c0e6159d

Jens Freimann authored Jul 29, 2013

Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c0e6159d

KVM: s390: refactor interrupt delivery code · 0fb97abe

Jens Freimann authored Jul 29, 2014

Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt()
to smaller functions which handle one type of interrupt.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0fb97abe

KVM: s390: add defines for virtio and pfault interrupt code · 60f90a14

Jens Freimann authored Nov 10, 2014

Get rid of open coded value for virtio and pfault completion interrupts.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

60f90a14

KVM: s390: external param not valid for cpu timer and ckc · af43eb2f

David Hildenbrand authored Nov 07, 2014

The 32bit external interrupt parameter is only valid for timing-alert and
service-signal interrupts.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

af43eb2f

KVM: s390: refactor interrupt injection code · 0146a7b0

Jens Freimann authored Jul 28, 2014

In preparation for the rework of the local interrupt injection code,
factor out injection routines from kvm_s390_inject_vcpu().
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0146a7b0

KVM: S390: Create helper function get_guest_storage_key · 9fcf93b5

Jason J. Herne authored Sep 23, 2014

Define get_guest_storage_key which can be used to get the value of a guest
storage key. This compliments the functionality provided by the helper function
set_guest_storage_key. Both functions are needed for live migration of s390
guests that use storage keys.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

9fcf93b5

KVM: s390: trigger the right CPU exit for floating interrupts · da00fcbd

Christian Borntraeger authored Nov 21, 2014

When injecting a floating interrupt and no CPU is idle we
kick one CPU to do an external exit. In case of I/O we
should trigger an I/O exit instead. This does not matter
for Linux guests as external and I/O interrupts are
enabled/disabled at the same time, but play safe anyway.

The same holds true for machine checks. Since there is no
special exit, just reuse the generic stop exit. The injection
code inside the VCPU loop will recheck anyway and rearm the
proper exits (e.g. control registers) if necessary.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

da00fcbd

KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instruction · 04b41acd

Thomas Huth authored Nov 12, 2014

A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

04b41acd

KVM: s390: Small fixes for the PFMF handler · a02689fe

Thomas Huth authored Nov 10, 2014

This patch includes two small fixes for the PFMF handler: First, the
start address for PFMF has to be masked according to the current
addressing mode, which is now done with kvm_s390_logical_to_effective().
Second, the protection exceptions have a lower priority than the
specification exceptions, so the check for low-address protection
has to be moved after the last spot where we inject a specification
exception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

a02689fe

24 Nov, 2014 2 commits

kvm: x86: avoid warning about potential shift wrapping bug · 2b4a273b

Paolo Bonzini authored Nov 24, 2014

cs.base is declared as a __u64 variable and vector is a u32 so this
causes a static checker warning.  The user indeed can set "sipi_vector"
to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the
value should really have 8-bit precision only.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2b4a273b

KVM: x86: move device assignment out of kvm_host.h · c9eab58f

Paolo Bonzini authored Nov 24, 2014

Create a new header, and hide the device assignment functions there.
Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying
arch/x86/kvm/iommu.c to take a PCI device struct.

Based on a patch by Radim Krcmar <rkrcmark@redhat.com>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c9eab58f

23 Nov, 2014 2 commits

kvm: x86: mask out XSAVES · b65d6e17

Paolo Bonzini authored Nov 21, 2014

This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS.  Mask it out.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b65d6e17

kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ · c274e03a

Radim Krčmář authored Nov 21, 2014

Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c274e03a

21 Nov, 2014 3 commits

kvm: remove IA64 ioctls · 6b397158

Radim Krčmář authored Nov 20, 2014

KVM ia64 is no longer present so new applications shouldn't use them.
The main problem is that they most likely didn't work even before,
because of a conflict in the #defines:

  #define KVM_SET_GUEST_DEBUG       _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
  #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)

The argument to KVM_SET_GUEST_DEBUG is:

  struct kvm_guest_debug {
  	__u32 control;
  	__u32 pad;
  	struct kvm_guest_debug_arch arch;
  };

  struct kvm_guest_debug_arch {
  };

meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8
and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK.

KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling
kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK),
so KVM_IA64_VCPU_SET_STACK would just return -EINVAL.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6b397158

kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64 · 3bf58e9a
Radim Krcmar authored Nov 21, 2014
```
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
3bf58e9a

kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ · 6ef768fa

Paolo Bonzini authored Nov 20, 2014

ia64 does not need them anymore.  Ack notifiers become x86-specific
too.
Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6ef768fa

20 Nov, 2014 2 commits

kvm: Documentation: remove ia64 · c32a4272

Tiejun Chen authored Nov 20, 2014

kvm/ia64 is gone, clean up Documentation too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c32a4272

KVM: ia64: remove · 003f7de6

Paolo Bonzini authored Nov 19, 2014

KVM for ia64 has been marked as broken not just once, but twice even,
and the last patch from the maintainer is now roughly 5 years old.
Time for it to rest in peace.
Acked-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

003f7de6

19 Nov, 2014 7 commits

KVM: x86: Remove FIXMEs in emulate.c · 86619e7b

Nicholas Krause authored Nov 19, 2014

Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

86619e7b

KVM: emulator: remove duplicated limit check · 997b0412

Paolo Bonzini authored Nov 19, 2014

The check on the higher limit of the segment, and the check on the
maximum accessible size, is the same for both expand-up and
expand-down segments.  Only the computation of "lim" varies.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

997b0412

KVM: emulator: remove code duplication in register_address{,_increment} · 01485a22

Paolo Bonzini authored Nov 19, 2014

register_address has been a duplicate of address_mask ever since the
ancestor of __linearize was born in 90de84f5 (KVM: x86 emulator:
preserve an operand's segment identity, 2010-11-17).

However, we can put it to a better use by including the call to reg_read
in register_address.  Similarly, the call to reg_rmw can be moved to
register_address_increment.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

01485a22

KVM: x86: Move __linearize masking of la into switch · 31ff6488

Nadav Amit authored Nov 19, 2014

In __linearize there is check of the condition whether to check if masking of
the linear address is needed.  It occurs immediately after switch that
evaluates the same condition.  Merge them.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

31ff6488

KVM: x86: Non-canonical access using SS should cause #SS · abc7d8a4

Nadav Amit authored Nov 19, 2014

When SS is used using a non-canonical address, an #SS exception is generated on
real hardware.  KVM emulator causes a #GP instead. Fix it to behave as real x86
CPU.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

abc7d8a4

KVM: x86: Perform limit checks when assigning EIP · d50eaa18

Nadav Amit authored Nov 19, 2014

If branch (e.g., jmp, ret) causes limit violations, since the target IP >
limit, the #GP exception occurs before the branch. In other words, the RIP
pushed on the stack should be that of the branch and not that of the target.

To do so, we can call __linearize, with new EIP, which also saves us the code
which performs the canonical address checks. On the case of assigning an EIP >=
2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP
does not exceed the limit and would trigger #GP(0) otherwise.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d50eaa18

KVM: x86: Emulator performs privilege checks on __linearize · a7315d2f

Nadav Amit authored Nov 19, 2014

When segment is accessed, real hardware does not perform any privilege level
checks. In contrast, KVM emulator does. This causes some discrepencies from
real hardware. For instance, reading from readable code segment may fail due to
incorrect segment checks. In addition, it introduces unnecassary overhead.

To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked
when the segment selector of a segment descriptor is loaded into a segment
register." The SDM never mentions privilege level checks during memory access,
except for loading far pointers in section 5.10 ("Pointer Validation"). Those
are actually segment selector loads and are emulated in the similarily (i.e.,
regardless to __linearize checks).

This behavior was also checked using sysexit. A data-segment whose DPL=0 was
loaded, and after sysexit (CPL=3) it is still accessible.

Therefore, all the privilege level checks in __linearize are removed.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a7315d2f