Commit c09dd2bb authored by Paolo Bonzini's avatar Paolo Bonzini

Merge branch 'kvm-redo-enable-virt' into HEAD

Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.

The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created.

That said, this is a nice cleanup on its own.  By registering the callbacks
on-demand, the callbacks themselves don't need to check kvm_usage_count,
because their very existence implies a non-zero count.

Patch 1 (re)adds a dedicated lock for kvm_usage_count.  This avoids a
lock ordering issue between cpus_read_lock() and kvm_lock.  The lock
ordering issue still exist in very rare cases, and will be fixed for
good by switching vm_list to an (S)RCU-protected list.
Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
parents 55f50b2f 590b09b1
......@@ -2648,6 +2648,23 @@
Default is Y (on).
kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
If enabled, KVM will enable virtualization in hardware
when KVM is loaded, and disable virtualization when KVM
is unloaded (if KVM is built as a module).
If disabled, KVM will dynamically enable and disable
virtualization on-demand when creating and destroying
VMs, i.e. on the 0=>1 and 1=>0 transitions of the
number of VMs.
Enabling virtualization at module lode avoids potential
latency for creation of the 0=>1 VM, as KVM serializes
virtualization enabling across all online CPUs. The
"cost" of enabling virtualization when KVM is loaded,
is that doing so may interfere with using out-of-tree
hypervisors that want to "own" virtualization hardware.
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
......
......@@ -11,6 +11,8 @@ The acquisition orders for mutexes are as follows:
- cpus_read_lock() is taken outside kvm_lock
- kvm_usage_lock is taken outside cpus_read_lock()
- kvm->lock is taken outside vcpu->mutex
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
......@@ -24,6 +26,12 @@ The acquisition orders for mutexes are as follows:
are taken on the waiting side when modifying memslots, so MMU notifiers
must not take either kvm->slots_lock or kvm->slots_arch_lock.
cpus_read_lock() vs kvm_lock:
- Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
being the official ordering, as it is quite easy to unknowingly trigger
cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list,
e.g. avoid complex operations when possible.
For SRCU:
- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
......@@ -227,10 +235,16 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
- kvm_usage_count
``kvm_usage_lock``
^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: any
:Protects: - kvm_usage_count
- hardware virtualization enable/disable
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
enable/disable.
:Comment: Exists to allow taking cpus_read_lock() while kvm_usage_count is
protected, which simplifies the virtualization enabling logic.
``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -290,11 +304,12 @@ time it will be set using the Dirty tracking mechanism described above.
wakeup.
``vendor_module_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: x86
:Protects: loading a vendor module (kvm_amd or kvm_intel)
:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
many operations need to take cpu_hotplug_lock when loading a vendor module,
e.g. updating static calls.
:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
updating static calls.
......@@ -2164,7 +2164,7 @@ static void cpu_hyp_uninit(void *discard)
}
}
int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
/*
* Most calls to this function are made with migration
......@@ -2184,7 +2184,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}
void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
kvm_timer_cpu_down();
kvm_vgic_cpu_down();
......@@ -2380,7 +2380,7 @@ static int __init do_pkvm_init(u32 hyp_va_bits)
/*
* The stub hypercalls are now disabled, so set our local flag to
* prevent a later re-init attempt in kvm_arch_hardware_enable().
* prevent a later re-init attempt in kvm_arch_enable_virtualization_cpu().
*/
__this_cpu_write(kvm_hyp_initialized, 1);
preempt_enable();
......
......@@ -261,7 +261,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -ENOIOCTLCMD;
}
int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
unsigned long env, gcfg = 0;
......@@ -300,7 +300,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}
void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
write_csr_gcfg(0);
write_csr_gstat(0);
......
......@@ -728,8 +728,8 @@ struct kvm_mips_callbacks {
int (*handle_fpe)(struct kvm_vcpu *vcpu);
int (*handle_msa_disabled)(struct kvm_vcpu *vcpu);
int (*handle_guest_exit)(struct kvm_vcpu *vcpu);
int (*hardware_enable)(void);
void (*hardware_disable)(void);
int (*enable_virtualization_cpu)(void);
void (*disable_virtualization_cpu)(void);
int (*check_extension)(struct kvm *kvm, long ext);
int (*vcpu_init)(struct kvm_vcpu *vcpu);
void (*vcpu_uninit)(struct kvm_vcpu *vcpu);
......
......@@ -125,14 +125,14 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return 1;
}
int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
return kvm_mips_callbacks->hardware_enable();
return kvm_mips_callbacks->enable_virtualization_cpu();
}
void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
kvm_mips_callbacks->hardware_disable();
kvm_mips_callbacks->disable_virtualization_cpu();
}
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
......
......@@ -2869,7 +2869,7 @@ static unsigned int kvm_vz_resize_guest_vtlb(unsigned int size)
return ret + 1;
}
static int kvm_vz_hardware_enable(void)
static int kvm_vz_enable_virtualization_cpu(void)
{
unsigned int mmu_size, guest_mmu_size, ftlb_size;
u64 guest_cvmctl, cvmvmconfig;
......@@ -2983,7 +2983,7 @@ static int kvm_vz_hardware_enable(void)
return 0;
}
static void kvm_vz_hardware_disable(void)
static void kvm_vz_disable_virtualization_cpu(void)
{
u64 cvmvmconfig;
unsigned int mmu_size;
......@@ -3280,8 +3280,8 @@ static struct kvm_mips_callbacks kvm_vz_callbacks = {
.handle_msa_disabled = kvm_trap_vz_handle_msa_disabled,
.handle_guest_exit = kvm_trap_vz_handle_guest_exit,
.hardware_enable = kvm_vz_hardware_enable,
.hardware_disable = kvm_vz_hardware_disable,
.enable_virtualization_cpu = kvm_vz_enable_virtualization_cpu,
.disable_virtualization_cpu = kvm_vz_disable_virtualization_cpu,
.check_extension = kvm_vz_check_extension,
.vcpu_init = kvm_vz_vcpu_init,
.vcpu_uninit = kvm_vz_vcpu_uninit,
......
......@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}
int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
csr_write(CSR_HEDELEG, KVM_HEDELEG_DEFAULT);
csr_write(CSR_HIDELEG, KVM_HIDELEG_DEFAULT);
......@@ -35,7 +35,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}
void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
kvm_riscv_aia_disable();
......
......@@ -14,8 +14,8 @@ BUILD_BUG_ON(1)
* be __static_call_return0.
*/
KVM_X86_OP(check_processor_compatibility)
KVM_X86_OP(hardware_enable)
KVM_X86_OP(hardware_disable)
KVM_X86_OP(enable_virtualization_cpu)
KVM_X86_OP(disable_virtualization_cpu)
KVM_X86_OP(hardware_unsetup)
KVM_X86_OP(has_emulated_msr)
KVM_X86_OP(vcpu_after_set_cpuid)
......
......@@ -36,6 +36,7 @@
#include <asm/kvm_page_track.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/hyperv-tlfs.h>
#include <asm/reboot.h>
#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
......@@ -1629,8 +1630,10 @@ struct kvm_x86_ops {
int (*check_processor_compatibility)(void);
int (*hardware_enable)(void);
void (*hardware_disable)(void);
int (*enable_virtualization_cpu)(void);
void (*disable_virtualization_cpu)(void);
cpu_emergency_virt_cb *emergency_disable_virtualization_cpu;
void (*hardware_unsetup)(void);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
......
......@@ -25,8 +25,8 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1
#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
typedef void (cpu_emergency_virt_cb)(void);
#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_disable_virtualization(void);
......
......@@ -592,14 +592,14 @@ static inline void kvm_cpu_svm_disable(void)
}
}
static void svm_emergency_disable(void)
static void svm_emergency_disable_virtualization_cpu(void)
{
kvm_rebooting = true;
kvm_cpu_svm_disable();
}
static void svm_hardware_disable(void)
static void svm_disable_virtualization_cpu(void)
{
/* Make sure we clean up behind us */
if (tsc_scaling)
......@@ -610,7 +610,7 @@ static void svm_hardware_disable(void)
amd_pmu_disable_virt();
}
static int svm_hardware_enable(void)
static int svm_enable_virtualization_cpu(void)
{
struct svm_cpu_data *sd;
......@@ -1533,7 +1533,7 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
* TSC_AUX is always virtualized for SEV-ES guests when the feature is
* available. The user return MSR support is not required in this case
* because TSC_AUX is restored on #VMEXIT from the host save area
* (which has been initialized in svm_hardware_enable()).
* (which has been initialized in svm_enable_virtualization_cpu()).
*/
if (likely(tsc_aux_uret_slot >= 0) &&
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
......@@ -3144,7 +3144,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
* feature is available. The user return MSR support is not
* required in this case because TSC_AUX is restored on #VMEXIT
* from the host save area (which has been initialized in
* svm_hardware_enable()).
* svm_enable_virtualization_cpu()).
*/
if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm))
break;
......@@ -4992,8 +4992,9 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.check_processor_compatibility = svm_check_processor_compat,
.hardware_unsetup = svm_hardware_unsetup,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.enable_virtualization_cpu = svm_enable_virtualization_cpu,
.disable_virtualization_cpu = svm_disable_virtualization_cpu,
.emergency_disable_virtualization_cpu = svm_emergency_disable_virtualization_cpu,
.has_emulated_msr = svm_has_emulated_msr,
.vcpu_create = svm_vcpu_create,
......@@ -5425,8 +5426,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
static void __svm_exit(void)
{
kvm_x86_vendor_exit();
cpu_emergency_unregister_virt_callback(svm_emergency_disable);
}
static int __init svm_init(void)
......@@ -5442,8 +5441,6 @@ static int __init svm_init(void)
if (r)
return r;
cpu_emergency_register_virt_callback(svm_emergency_disable);
/*
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
......
......@@ -23,8 +23,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.hardware_unsetup = vmx_hardware_unsetup,
.hardware_enable = vmx_hardware_enable,
.hardware_disable = vmx_hardware_disable,
.enable_virtualization_cpu = vmx_enable_virtualization_cpu,
.disable_virtualization_cpu = vmx_disable_virtualization_cpu,
.emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu,
.has_emulated_msr = vmx_has_emulated_msr,
.vm_size = sizeof(struct kvm_vmx),
......
......@@ -755,7 +755,7 @@ static int kvm_cpu_vmxoff(void)
return -EIO;
}
static void vmx_emergency_disable(void)
void vmx_emergency_disable_virtualization_cpu(void)
{
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v;
......@@ -2844,7 +2844,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
return -EFAULT;
}
int vmx_hardware_enable(void)
int vmx_enable_virtualization_cpu(void)
{
int cpu = raw_smp_processor_id();
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
......@@ -2881,7 +2881,7 @@ static void vmclear_local_loaded_vmcss(void)
__loaded_vmcs_clear(v);
}
void vmx_hardware_disable(void)
void vmx_disable_virtualization_cpu(void)
{
vmclear_local_loaded_vmcss();
......@@ -8584,8 +8584,6 @@ static void __vmx_exit(void)
{
allow_smaller_maxphyaddr = false;
cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
vmx_cleanup_l1d_flush();
}
......@@ -8632,8 +8630,6 @@ static int __init vmx_init(void)
pi_init_cpu(cpu);
}
cpu_emergency_register_virt_callback(vmx_emergency_disable);
vmx_check_vmcs12_offsets();
/*
......
......@@ -13,8 +13,9 @@ extern struct kvm_x86_init_ops vt_init_ops __initdata;
void vmx_hardware_unsetup(void);
int vmx_check_processor_compat(void);
int vmx_hardware_enable(void);
void vmx_hardware_disable(void);
int vmx_enable_virtualization_cpu(void);
void vmx_disable_virtualization_cpu(void);
void vmx_emergency_disable_virtualization_cpu(void);
int vmx_vm_init(struct kvm *kvm);
void vmx_vm_destroy(struct kvm *kvm);
int vmx_vcpu_precreate(struct kvm *kvm);
......
......@@ -355,7 +355,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
/*
* Disabling irqs at this point since the following code could be
* interrupted and executed through kvm_arch_hardware_disable()
* interrupted and executed through kvm_arch_disable_virtualization_cpu()
*/
local_irq_save(flags);
if (msrs->registered) {
......@@ -9753,7 +9753,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
guard(mutex)(&vendor_module_lock);
if (kvm_x86_ops.hardware_enable) {
if (kvm_x86_ops.enable_virtualization_cpu) {
pr_err("already loaded vendor module '%s'\n", kvm_x86_ops.name);
return -EEXIST;
}
......@@ -9880,7 +9880,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
return 0;
out_unwind_ops:
kvm_x86_ops.hardware_enable = NULL;
kvm_x86_ops.enable_virtualization_cpu = NULL;
kvm_x86_call(hardware_unsetup)();
out_mmu_exit:
kvm_mmu_vendor_module_exit();
......@@ -9921,7 +9921,7 @@ void kvm_x86_vendor_exit(void)
WARN_ON(static_branch_unlikely(&kvm_xen_enabled.key));
#endif
mutex_lock(&vendor_module_lock);
kvm_x86_ops.hardware_enable = NULL;
kvm_x86_ops.enable_virtualization_cpu = NULL;
mutex_unlock(&vendor_module_lock);
}
EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit);
......@@ -12516,7 +12516,17 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
int kvm_arch_hardware_enable(void)
void kvm_arch_enable_virtualization(void)
{
cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
}
void kvm_arch_disable_virtualization(void)
{
cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
}
int kvm_arch_enable_virtualization_cpu(void)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
......@@ -12532,7 +12542,7 @@ int kvm_arch_hardware_enable(void)
if (ret)
return ret;
ret = kvm_x86_call(hardware_enable)();
ret = kvm_x86_call(enable_virtualization_cpu)();
if (ret != 0)
return ret;
......@@ -12612,9 +12622,9 @@ int kvm_arch_hardware_enable(void)
return 0;
}
void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
kvm_x86_call(hardware_disable)();
kvm_x86_call(disable_virtualization_cpu)();
drop_user_return_notifiers();
}
......
......@@ -1529,8 +1529,22 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif
#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
/*
* kvm_arch_{enable,disable}_virtualization() are called on one CPU, under
* kvm_usage_lock, immediately after/before 0=>1 and 1=>0 transitions of
* kvm_usage_count, i.e. at the beginning of the generic hardware enabling
* sequence, and at the end of the generic hardware disabling sequence.
*/
void kvm_arch_enable_virtualization(void);
void kvm_arch_disable_virtualization(void);
/*
* kvm_arch_{enable,disable}_virtualization_cpu() are called on "every" CPU to
* do the actual twiddling of hardware bits. The hooks are called on all
* online CPUs when KVM enables/disabled virtualization, and on a single CPU
* when that CPU is onlined/offlined (including for Resume/Suspend).
*/
int kvm_arch_enable_virtualization_cpu(void);
void kvm_arch_disable_virtualization_cpu(void);
#endif
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment