• Sean Christopherson's avatar
    KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES · 9fc22296
    Sean Christopherson authored
    Give userspace full control of the read-only bits in MISC_ENABLES, i.e.
    do not modify bits on PMU refresh and do not preserve existing bits when
    userspace writes MISC_ENABLES.  With a few exceptions where KVM doesn't
    expose the necessary controls to userspace _and_ there is a clear cut
    association with CPUID, e.g. reserved CR4 bits, KVM does not own the vCPU
    and should not manipulate the vCPU model on behalf of "dummy user space".
    
    The argument that KVM is doing userspace a favor because "the order of
    setting vPMU capabilities and MSR_IA32_MISC_ENABLE is not strictly
    guaranteed" is specious, as attempting to configure MSRs on behalf of
    userspace inevitably leads to edge cases precisely because KVM does not
    prescribe a specific order of initialization.
    
    Example #1: intel_pmu_refresh() consumes and modifies the vCPU's
    MSR_IA32_PERF_CAPABILITIES, and so assumes userspace initializes config
    MSRs before setting the guest CPUID model.  If userspace sets CPUID
    first, then KVM will mark PEBS as available when arch.perf_capabilities
    is initialized with a non-zero PEBS format, thus creating a bad vCPU
    model if userspace later disables PEBS by writing PERF_CAPABILITIES.
    
    Example #2: intel_pmu_refresh() does not clear PERF_CAP_PEBS_MASK in
    MSR_IA32_PERF_CAPABILITIES if there is no vPMU, making KVM inconsistent
    in its desire to be consistent.
    
    Example #3: intel_pmu_refresh() does not clear MSR_IA32_MISC_ENABLE_EMON
    if KVM_SET_CPUID2 is called multiple times, first with a vPMU, then
    without a vPMU.  While slightly contrived, it's plausible a VMM could
    reflect KVM's default vCPU and then operate on KVM's copy of CPUID to
    later clear the vPMU settings, e.g. see KVM's selftests.
    
    Example #4: Enumerating an Intel vCPU on an AMD host will not call into
    intel_pmu_refresh() at any point, and so the BTS and PEBS "unavailable"
    bits will be left clear, without any way for userspace to set them.
    
    Keep the "R" behavior of the bit 7, "EMON available", for the guest.
    Unlike the BTS and PEBS bits, which are fully "RO", the EMON bit can be
    written with a different value, but that new value is ignored.
    
    Cc: Like Xu <likexu@tencent.com>
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
    Message-Id: <20220611005755.753273-2-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    9fc22296
x86.c 346 KB