• Sean Christopherson's avatar
    x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted · 009bce1d
    Sean Christopherson authored
    Choo! Choo!  All aboard the Split Lock Express, with direct service to
    Wreckage!
    
    Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
    SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
    exists, and is writable, on many generations of CPUs.  Writing the MSR,
    even with '0', can result in bizarre, undocumented behavior.
    
    This fixes a crash on Haswell when resuming from suspend with a live KVM
    guest.  Because APs use the standard SMP boot flow for resume, they will
    go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
    which runs even when sld_state==sld_off to ensure SLD is disabled.  On
    Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
    succeed and _may_ take the SMT _sibling_ out of VMX root mode.
    
    When KVM has an active guest, KVM performs VMXON as part of CPU onlining
    (see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
    flow is effectively:
    
      on_each_ap_cpu() {
         WRMSR(MSR_TEST_CTRL, 0)
         VMXON
      }
    
    As a result, the WRMSR can disable VMX on a different CPU that has
    already done VMXON.  This ultimately results in a #UD on VMPTRLD when
    KVM regains control and attempt run its vCPUs.
    
    The above voodoo was confirmed by reworking KVM's VMXON flow to write
    MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
    Further verification of the insanity was done by redoing VMXON on all
    APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
    which should VM-Fail, occasionally succeeded, and also eliminated the
    unexpected #UD on VMPTRLD.
    
    The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
    to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
    almost always hang (even when fudging VMXON), a #UD on a random Jcc was
    observed, suspend/resume stability is qualitatively poor, and so on and
    so forth.
    
      kernel BUG at arch/x86/kvm/x86.c:386!
      CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G      D
      Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
      RIP: 0010:kvm_spurious_fault+0xf/0x20
      Call Trace:
       vmx_vcpu_load_vmcs+0x1fb/0x2b0
       vmx_vcpu_load+0x3e/0x160
       kvm_arch_vcpu_load+0x48/0x260
       finish_task_switch+0x140/0x260
       __schedule+0x460/0x720
       _cond_resched+0x2d/0x40
       kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
       kvm_vcpu_ioctl+0x363/0x5c0
       ksys_ioctl+0x88/0xa0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x4c/0x170
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    Fixes: dbaba470 ("x86/split_lock: Rework the initialization flow of split lock detection")
    Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.com
    009bce1d
intel.c 33.4 KB