Commits · ee66e453db13d4837a0dcf9d43efa7a88603161b · nexedi / linux

30 Apr, 2019 6 commits

KVM: lapic: Busy wait for timer to expire when using hv_timer · ee66e453

Sean Christopherson authored Apr 16, 2019

...now that VMX's preemption timer, i.e. the hv_timer, also adjusts its
programmed time based on lapic_timer_advance_ns.  Without the delay, a
guest can see a timer interrupt arrive before the requested time when
KVM is using the hv_timer to emulate the guest's interrupt.

Fixes: c5ce8235 ("KVM: VMX: Optimize tscdeadline timer latency")
Cc: <stable@vger.kernel.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ee66e453

KVM: VMX: Nop emulation of MSR_IA32_POWER_CTL · 6c6a2ab9

Liran Alon authored Apr 15, 2019

Since commits 668fffa3 ("kvm: better MWAIT emulation for guestsâ€)
and 4d5422ce ("KVM: X86: Provide a capability to disable MWAIT interceptsâ€),
KVM was modified to allow an admin to configure certain guests to execute
MONITOR/MWAIT inside guest without being intercepted by host.

This is useful in case admin wishes to allocate a dedicated logical
processor for each vCPU thread. Thus, making it safe for guest to
completely control the power-state of the logical processor.

The ability to use this new KVM capability was introduced to QEMU by
commits 6f131f13e68d ("kvm: support -overcommit cpu-pm=on|offâ€) and
2266d4431132 ("i386/cpu: make -cpu host support monitor/mwaitâ€).

However, exposing MONITOR/MWAIT to a Linux guest may cause it's intel_idle
kernel module to execute c1e_promotion_disable() which will attempt to
RDMSR/WRMSR from/to MSR_IA32_POWER_CTL to manipulate the "C1E Enable"
bit. This behaviour was introduced by commit
32e95180 ("intel_idle: export both C1 and C1Eâ€).

Becuase KVM doesn't emulate this MSR, running KVM with ignore_msrs=0
will cause the above guest behaviour to raise a #GP which will cause
guest to kernel panic.

Therefore, add support for nop emulation of MSR_IA32_POWER_CTL to
avoid #GP in guest in this scenario.

Future commits can optimise emulation further by reflecting guest
MSR changes to host MSR to provide guest with the ability to
fine-tune the dedicated logical processor power-state.
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6c6a2ab9

KVM: x86: Add support of clear Trace_ToPA_PMI status · c715eb9f

Luwei Kang authored Feb 18, 2019

Let guests clear the Intel PT ToPA PMI status (bit 55 of
MSR_CORE_PERF_GLOBAL_OVF_CTRL).
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c715eb9f

KVM: x86: Inject PMI for KVM guest · 8479e04e

Luwei Kang authored Feb 18, 2019

Inject a PMI for KVM guest when Intel PT working
in Host-Guest mode and Guest ToPA entry memory buffer
was completely filled.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8479e04e

Revert "KVM: doc: Document the life cycle of a VM and its resources" · 3a1e5e4a

Radim Krčmář authored Apr 29, 2019

This reverts commit 919f6cd8.

The patch was applied twice.
The first commit is eca6be56.
Reported-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3a1e5e4a

Merge tag 'kvm-s390-next-5.2-1' of... · da8f0d97

Paolo Bonzini authored Apr 30, 2019

Merge tag 'kvm-s390-next-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features and fixes for 5.2

- VSIE crypto fixes
- new guest features for gen15
- disable halt polling for nested virtualization with overcommit

da8f0d97

29 Apr, 2019 2 commits

KVM: s390: vsie: Return correct values for Invalid CRYCB format · b2d0371d

Pierre Morel authored Apr 26, 2019

Let's use the correct validity number.

Fixes: 56019f9a ("KVM: s390: vsie: Allow CRYCB FORMAT-2")
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1556269201-22918-1-git-send-email-pmorel@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

b2d0371d

KVM: s390: vsie: Do not shadow CRYCB when no AP and no keys · bcccb8f6

Pierre Morel authored Apr 26, 2019

When the guest do not have AP instructions nor Key management
we should return without shadowing the CRYCB.

We did not check correctly in the past.

Fixes: b10bd9a2 ("s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity")
Fixes: 6ee74098 ("KVM: s390: vsie: allow CRYCB FORMAT-0")
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1556269010-22258-1-git-send-email-pmorel@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

bcccb8f6

26 Apr, 2019 2 commits

KVM: s390: provide kvm_arch_no_poll function · 8b905d28

Christian Borntraeger authored Mar 05, 2019

We do track the current steal time of the host CPUs. Let us use
this value to disable halt polling if the steal time goes beyond
a configured value.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

8b905d28

KVM: polling: add architecture backend to disable polling · cdd6ad3a

Christian Borntraeger authored Mar 05, 2019

There are cases where halt polling is unwanted. For example when running
KVM on an over committed LPAR we rather want to give back the CPU to
neighbour LPARs instead of polling. Let us provide a callback that
allows architectures to disable polling.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

cdd6ad3a

25 Apr, 2019 2 commits

KVM: s390: enable MSA9 keywrapping functions depending on cpu model · 8ec2fa52

Christian Borntraeger authored Apr 03, 2019

Instead of adding a new machine option to disable/enable the keywrapping
options of pckmo (like for AES and DEA) we can now use the CPU model to
decide. As ECC is also wrapped with the AES key we need that to be
enabled.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

8ec2fa52

KVM: s390: add deflate conversion facilty to cpu model · 4f45b90e

Christian Borntraeger authored Dec 28, 2018

This enables stfle.151 and adds the subfunctions for DFLTCC. Bit 151 is
added to the list of facilities that will be enabled when there is no
cpu model involved as DFLTCC requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

4f45b90e

18 Apr, 2019 6 commits

KVM: s390: add enhanced sort facilty to cpu model · 173aec2d

Christian Borntraeger authored Dec 28, 2018

This enables stfle.150 and adds the subfunctions for SORTL. Bit 150 is
added to the list of facilities that will be enabled when there is no
cpu model involved as sortl requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

173aec2d

KVM: s390: provide query function for instructions returning 32 byte · d6681397

Christian Borntraeger authored Feb 20, 2019

Some of the new features have a 32byte response for the query function.
Provide a new wrapper similar to __cpacf_query. We might want to factor
this out if other users come up, as of today there is none. So let us
keep the function within KVM.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

d6681397

KVM: s390: add MSA9 to cpumodel · 13209ad0

Christian Borntraeger authored Dec 28, 2018

This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is
added to the list of facilities that will be enabled when there is no
cpu model involved as MSA9 requires no additional handling from
userspace, e.g. for migration.

Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

13209ad0

KVM: s390: add vector BCD enhancements facility to cpumodel · d5cb6ab1

Christian Borntraeger authored Dec 28, 2018

If vector support is enabled, the vector BCD enhancements facility
might also be enabled.
We can directly forward this facility to the guest if available
and VX is requested by user space.

Please note that user space can and will have the final decision
on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

d5cb6ab1

KVM: s390: add vector enhancements facility 2 to cpumodel · 7832e91c

Christian Borntraeger authored Dec 28, 2018

If vector support is enabled, the vector enhancements facility 2
might also be enabled.
We can directly forward this facility to the guest if available
and VX is requested by user space.

Please note that user space can and will have the final decision
on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>

7832e91c

KVM: s390: Fix potential spectre warnings · 58616e6a

Eric Farman authored Apr 17, 2019

Fix some warnings from smatch:

arch/s390/kvm/interrupt.c:2310 get_io_adapter() warn: potential spectre issue 'kvm->arch.adapters' [r] (local cap)
arch/s390/kvm/interrupt.c:2341 register_io_adapter() warn: potential spectre issue 'dev->kvm->arch.adapters' [w]
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190417005414.47801-1-farman@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

58616e6a

16 Apr, 2019 22 commits

kvm: move KVM_CAP_NR_MEMSLOTS to common code · c110ae57

Paolo Bonzini authored Mar 28, 2019

All architectures except MIPS were defining it in the same way,
and memory slots are handled entirely by common code so there
is no point in keeping the definition per-architecture.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c110ae57

KVM: x86: Inject #GP if guest attempts to set unsupported EFER bits · 0a629563

Sean Christopherson authored Apr 02, 2019

EFER.LME and EFER.NX are considered reserved if their respective feature
bits are not advertised to the guest.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0a629563

KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes · 11988499

Sean Christopherson authored Apr 02, 2019

KVM allows userspace to violate consistency checks related to the
guest's CPUID model to some degree.  Generally speaking, userspace has
carte blanche when it comes to guest state so long as jamming invalid
state won't negatively affect the host.

Currently this is seems to be a non-issue as most of the interesting
EFER checks are missing, e.g. NX and LME, but those will be added
shortly.  Proactively exempt userspace from the CPUID checks so as not
to break userspace.

Note, the efer_reserved_bits check still applies to userspace writes as
that mask reflects the host's capabilities, e.g. KVM shouldn't allow a
guest to run with NX=1 if it has been disabled in the host.

Fixes: d8017474 ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

11988499

KVM: nVMX: Return -EINVAL when signaling failure in VM-Entry helpers · c80add0f

Sean Christopherson authored Apr 11, 2019

Most, but not all, helpers that are related to emulating consistency
checks for nested VM-Entry return -EINVAL when a check fails.  Convert
the holdouts to have consistency throughout and to make it clear that
the functions are signaling pass/fail as opposed to "resume guest" vs.
"exit to userspace".

Opportunistically fix bad indentation in nested_vmx_check_guest_state().
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c80add0f

KVM: nVMX: Return -EINVAL when signaling failure in pre-VM-Entry helpers · 98d9e858

Paolo Bonzini authored Apr 12, 2019

Convert all top-level nested VM-Enter consistency check functions to
return 0/-EINVAL instead of failure codes, since now they can only
ever return one failure code.

This also does not give the false impression that failure information is
always consumed and/or relevant, e.g. vmx_set_nested_state() only
cares whether or not the checks were successful.

nested_check_host_control_regs() can also now be inlined into its caller,
nested_vmx_check_host_state, since the two have effectively become the
same function.

Based on a patch by Sean Christopherson.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

98d9e858

KVM: nVMX: Rename and split top-level consistency checks to match SDM · 5478ba34

Sean Christopherson authored Apr 11, 2019

Rename the top-level consistency check functions to (loosely) align with
the SDM. Historically, KVM has used the terms "prereq" and "postreq" to
differentiate between consistency checks that lead to VM-Fail and those
that lead to VM-Exit. The terms are vague and potentially misleading,
e.g. "postreq" might be interpreted as occurring after VM-Entry.

Note, while the SDM lumps controls and host state into a single section,
"Checks on VMX Controls and Host-State Area", split them into separate
top-level functions as the two categories of checks result in different
VM instruction errors. This split will allow for additional cleanup.

Note #2, "vmentry" is intentionally dropped from the new function names
to avoid confusion with nested_check_vm_entry_controls(), and to keep
the length of the functions names somewhat manageable.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5478ba34

KVM: nVMX: Move guest non-reg state checks to VM-Exit path · 9c3e922b

Sean Christopherson authored Apr 11, 2019

Per Intel's SDM, volume 3, section Checking and Loading Guest State:

Because the checking and the loading occur concurrently, a failure may
be discovered only after some state has been loaded. For this reason,
the logical processor responds to such failures by loading state from
the host-state area, as it would for a VM exit.

In other words, a failed non-register state consistency check results in
a VM-Exit, not VM-Fail. Moving the non-reg state checks also paves the
way for renaming nested_vmx_check_vmentry_postreqs() to align with the
SDM, i.e. nested_vmx_check_vmentry_guest_state().

Fixes: 26539bd0 ("KVM: nVMX: check vmcs12 for valid activity state")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9c3e922b

kvm: nVMX: Check "load IA32_PAT" VM-entry control on vmentry · de2bc2bf

Krish Sadhukhan authored Apr 08, 2019

According to section "Checking and Loading Guest State" in Intel SDM vol
3C, the following check is performed on vmentry:

    If the "load IA32_PAT" VM-entry control is 1, the value of the field
    for the IA32_PAT MSR must be one that could be written by WRMSR
    without fault at CPL 0. Specifically, each of the 8 bytes in the
    field must have one of the values 0 (UC), 1 (WC), 4 (WT), 5 (WP),
    6 (WB), or 7 (UC-).
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

de2bc2bf

kvm: nVMX: Check "load IA32_PAT" VM-exit control on vmentry · f6b0db1f

Krish Sadhukhan authored Apr 08, 2019

According to section "Checks on Host Control Registers and MSRs" in Intel
SDM vol 3C, the following check is performed on vmentry:

    If the "load IA32_PAT" VM-exit control is 1, the value of the field
    for the IA32_PAT MSR must be one that could be written by WRMSR
    without fault at CPL 0. Specifically, each of the 8 bytes in the
    field must have one of the values 0 (UC), 1 (WC), 4 (WT), 5 (WP),
    6 (WB), or 7 (UC-).
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f6b0db1f

KVM: x86: optimize check for valid PAT value · 674ea351

Paolo Bonzini authored Apr 10, 2019

This check will soon be done on every nested vmentry and vmexit,
"parallelize" it using bitwise operations.
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

674ea351

KVM: x86: clear VM_EXIT_SAVE_IA32_PAT · f16cb57b

Paolo Bonzini authored Apr 10, 2019

This is not needed, PAT writes always take an MSR vmexit.
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f16cb57b

KVM: vmx: print more APICv fields in dump_vmcs · 9d609649

Paolo Bonzini authored Apr 15, 2019

The SVI, RVI, virtual-APIC page address and APIC-access page address fields
were left out of dump_vmcs.  Add them.

KERN_CONT technically isn't SMP safe, but it's okay to use it here since
the whole of dump_vmcs() is a single huge multi-line piece of output
that isn't SMP-safe.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9d609649

KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing · 7a223e06

Vitaly Kuznetsov authored Mar 27, 2019

In __apic_accept_irq() interface trig_mode is int and actually on some code
paths it is set above u8:

kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
(1 << 15) & e->msi.data

kvm_apic_local_deliver sets it to reg & (1 << 15).

Fix the immediate issue by making 'tm' into u16. We may also want to adjust
__apic_accept_irq() interface and use proper sizes for vector, level,
trig_mode but this is not urgent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7a223e06

KVM: fix spectrev1 gadgets · 1d487e9b

Paolo Bonzini authored Apr 11, 2019

These were found with smatch, and then generalized when applicable.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1d487e9b

KVM: x86: fix warning Using plain integer as NULL pointer · be43c440

Hariprasad Kelam authored Apr 06, 2019

Changed passing argument as "0 to NULL" which resolves below sparse warning

arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

be43c440

selftests: kvm: add a selftest for SMM · 79904c9d

Vitaly Kuznetsov authored Apr 10, 2019

Add a simple test for SMM, based on VMX.  The test implements its own
sync between the guest and the host as using our ucall library seems to
be too cumbersome: SMI handler is happening in real-address mode.

This patch also fixes KVM_SET_NESTED_STATE to happen after
KVM_SET_VCPU_EVENTS, in fact it places it last.  This is because
KVM needs to know whether the processor is in SMM or not.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

79904c9d

selftests: kvm: fix for compilers that do not support -no-pie · c2390f16

Paolo Bonzini authored Apr 11, 2019

-no-pie was added to GCC at the same time as their configuration option
--enable-default-pie.  Compilers that were built before do not have
-no-pie, but they also do not need it.  Detect the option at build
time.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c2390f16

selftests: kvm/evmcs_test: complete I/O before migrating guest state · c68c21ca

Paolo Bonzini authored Apr 11, 2019

Starting state migration after an IO exit without first completing IO
may result in test failures.  We already have two tests that need this
(this patch in fact fixes evmcs_test, similar to what was fixed for
state_test in commit 0f73bbc8, "KVM: selftests: complete IO before
migrating guest state", 2019-03-13) and a third is coming.  So, move the
code to vcpu_save_state, and while at it do not access register state
until after I/O is complete.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c68c21ca

KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · b68f3cc7

Sean Christopherson authored Apr 02, 2019

Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.

KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.

SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).

Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.

Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b68f3cc7

KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 8f4dc2e7

Sean Christopherson authored Apr 02, 2019

Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
state area, i.e. don't save/restore EFER across SMM transitions. KVM
somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
guest doesn't support long mode. But during RSM, KVM unconditionally
clears EFER so that it can get back to pure 32-bit mode in order to
start loading CRs with their actual non-SMM values.

Clear EFER only when it will be written when loading the non-SMM state
so as to preserve bits that can theoretically be set on 32-bit vCPUs,
e.g. KVM always emulates EFER_SCE.

And because CR4.PAE is cleared only to play nice with EFER, wrap that
code in the long mode check as well. Note, this may result in a
compiler warning about cr4 being consumed uninitialized. Re-read CR4
even though it's technically unnecessary, as doing so allows for more
readable code and RSM emulation is not a performance critical path.

8f4dc2e7

KVM: x86: clear SMM flags before loading state while leaving SMM · 9ec19493

Sean Christopherson authored Apr 02, 2019

RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
loading SMSTATE into architectural state, e.g. by toggling it for
problematic flows, and simply clear HF_SMM_MASK prior to loading
architectural state (from SMRAM save state area).
Reported-by: Jon Doron <arilou@gmail.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 5bea5123 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9ec19493

KVM: x86: Open code kvm_set_hflags · c5833c7a

Sean Christopherson authored Apr 02, 2019

Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
save state map, i.e. kvm_smm_changed() needs to be called after state
has been loaded and so cannot be done automatically when setting
hflags from RSM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c5833c7a