• Quentin Casasnovas's avatar
    KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode. · ff30ef40
    Quentin Casasnovas authored
    I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
    getting a GP(0) on a rather unspecial vmread from Xen:
    
         (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=n  Not tainted ]----
         (XEN) CPU:    1
         (XEN) RIP:    e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
         (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
         (XEN) rax: ffff82d0801e6288   rbx: ffff83003ffbfb7c   rcx: fffffffffffab928
         (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff83000bdd0000
         (XEN) rbp: ffff83000bdd0000   rsp: ffff83003ffbfab0   r8:  ffff830038813910
         (XEN) r9:  ffff83003faf3958   r10: 0000000a3b9f7640   r11: ffff83003f82d418
         (XEN) r12: 0000000000000000   r13: ffff83003ffbffff   r14: 0000000000004802
         (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4: 00000000001526e0
         (XEN) cr3: 000000003fc79000   cr2: 0000000000000000
         (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
         (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
         (XEN)  00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
         (XEN) Xen stack trace from rsp=ffff83003ffbfab0:
    
         ...
    
         (XEN) Xen call trace:
         (XEN)    [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
         (XEN)    [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
         (XEN)    [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
         (XEN)    [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
         (XEN)    [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
         (XEN)    [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
         (XEN)    [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
         (XEN)    [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
         (XEN)    [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
         (XEN)    [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
         (XEN)    [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140
         (XEN)    [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
         (XEN)    [<ffff82d080164aeb>] context_switch+0x13b/0xe40
         (XEN)    [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
         (XEN)    [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
         (XEN)    [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
         (XEN)
         (XEN)
         (XEN) ****************************************
         (XEN) Panic on CPU 1:
         (XEN) GENERAL PROTECTION FAULT
         (XEN) [error_code=0000]
         (XEN) ****************************************
    
    Tracing my host KVM showed it was the one injecting the GP(0) when
    emulating the VMREAD and checking the destination segment permissions in
    get_vmx_mem_address():
    
         3)               |    vmx_handle_exit() {
         3)               |      handle_vmread() {
         3)               |        nested_vmx_check_permission() {
         3)               |          vmx_get_segment() {
         3)   0.074 us    |            vmx_read_guest_seg_base();
         3)   0.065 us    |            vmx_read_guest_seg_selector();
         3)   0.066 us    |            vmx_read_guest_seg_ar();
         3)   1.636 us    |          }
         3)   0.058 us    |          vmx_get_rflags();
         3)   0.062 us    |          vmx_read_guest_seg_ar();
         3)   3.469 us    |        }
         3)               |        vmx_get_cs_db_l_bits() {
         3)   0.058 us    |          vmx_read_guest_seg_ar();
         3)   0.662 us    |        }
         3)               |        get_vmx_mem_address() {
         3)   0.068 us    |          vmx_cache_reg();
         3)               |          vmx_get_segment() {
         3)   0.074 us    |            vmx_read_guest_seg_base();
         3)   0.068 us    |            vmx_read_guest_seg_selector();
         3)   0.071 us    |            vmx_read_guest_seg_ar();
         3)   1.756 us    |          }
         3)               |          kvm_queue_exception_e() {
         3)   0.066 us    |            kvm_multiple_exception();
         3)   0.684 us    |          }
         3)   4.085 us    |        }
         3)   9.833 us    |      }
         3) + 10.366 us   |    }
    
    Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
    Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
    Control Structure", I found that we're enforcing that the destination
    operand is NOT located in a read-only data segment or any code segment when
    the L1 is in long mode - BUT that check should only happen when it is in
    protected mode.
    
    Shuffling the code a bit to make our emulation follow the specification
    allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
    without problems.
    
    Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
    Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
    Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Radim Krčmář <rkrcmar@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: linux-stable <stable@vger.kernel.org>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    ff30ef40
vmx.c 312 KB