• Sean Christopherson's avatar
    KVM: nVMX: Defer APICv updates while L2 is active until L1 is active · 7c69661e
    Sean Christopherson authored
    Defer APICv updates that occur while L2 is active until nested VM-Exit,
    i.e. until L1 regains control.  vmx_refresh_apicv_exec_ctrl() assumes L1
    is active and (a) stomps all over vmcs02 and (b) neglects to ever updated
    vmcs01.  E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no
    APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv
    becomes unhibited while L2 is active, KVM will set various APICv controls
    in vmcs02 and trigger a failed VM-Entry.  The kicker is that, unless
    running with nested_early_check=1, KVM blames L1 and chaos ensues.
    
    In all cases, ignoring vmcs02 and always deferring the inhibition change
    to vmcs01 is correct (or at least acceptable).  The ABSENT and DISABLE
    inhibitions cannot truly change while L2 is active (see below).
    
    IRQ_BLOCKING can change, but it is firmly a best effort debug feature.
    Furthermore, only L2's APIC is accelerated/virtualized to the full extent
    possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR
    interception will apply to the virtual APIC managed by KVM.
    The exception is the SELF_IPI register when x2APIC is enabled, but that's
    an acceptable hole.
    
    Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the
    MSRs to L2, but for that to work in any sane capacity, L1 would need to
    pass through IRQs to L2 as well, and IRQs must be intercepted to enable
    virtual interrupt delivery.  I.e. exposing Auto EOI to L2 and enabling
    VID for L2 are, for all intents and purposes, mutually exclusive.
    
    Lack of dynamic toggling is also why this scenario is all but impossible
    to encounter in KVM's current form.  But a future patch will pend an
    APICv update request _during_ vCPU creation to plug a race where a vCPU
    that's being created doesn't get included in the "all vCPUs request"
    because it's not yet visible to other vCPUs.  If userspaces restores L2
    after VM creation (hello, KVM selftests), the first KVM_RUN will occur
    while L2 is active and thus service the APICv update request made during
    VM creation.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20220420013732.3308816-3-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    7c69661e
nested.c 209 KB