• Sean Christopherson's avatar
    KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup · a90e0979
    Sean Christopherson authored
    commit e8143499 upstream.
    
    ____kvm_handle_fault_on_reboot() provides a generic exception fixup
    handler that is used to cleanly handle faults on VMX/SVM instructions
    during reboot (or at least try to).  If there isn't a reboot in
    progress, ____kvm_handle_fault_on_reboot() treats any exception as
    fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
    a BUG() to get a stack trace and die.
    
    When it was originally added by commit 4ecac3fd ("KVM: Handle
    virtualization instruction #UD faults during reboot"), the "call" to
    kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
    is the RIP of the faulting instructing.
    
    The PUSH+JMP trickery is necessary because the exception fixup handler
    code lies outside of its associated function, e.g. right after the
    function.  An actual CALL from the .fixup code would show a slightly
    bogus stack trace, e.g. an extra "random" function would be inserted
    into the trace, as the return RIP on the stack would point to no known
    function (and the unwinder will likely try to guess who owns the RIP).
    
    Unfortunately, the JMP was replaced with a CALL when the macro was
    reworked to not spin indefinitely during reboot (commit b7c4145b
    "KVM: Don't spin on virt instruction faults during reboot").  This
    causes the aforementioned behavior where a bogus function is inserted
    into the stack trace, e.g. my builds like to blame free_kvm_area().
    
    Revert the CALL back to a JMP.  The changelog for commit b7c4145b
    ("KVM: Don't spin on virt instruction faults during reboot") contains
    nothing that indicates the switch to CALL was deliberate.  This is
    backed up by the fact that the PUSH <insn RIP> was left intact.
    
    Note that an alternative to the PUSH+JMP magic would be to JMP back
    to the "real" code and CALL from there, but that would require adding
    a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
    and would add no value, i.e. the stack trace would be the same.
    
    Using CALL:
    
    ------------[ cut here ]------------
    kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
    invalid opcode: 0000 [#1] SMP
    CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
    RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
    Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
    RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
    R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
    FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
    Call Trace:
     free_kvm_area+0x1044/0x43ea [kvm_intel]
     ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
     ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
     ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
     ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
     ? __set_task_blocked+0x38/0x90
     ? __set_current_blocked+0x50/0x60
     ? __fpu__restore_sig+0x97/0x490
     ? do_vfs_ioctl+0xa1/0x620
     ? __x64_sys_futex+0x89/0x180
     ? ksys_ioctl+0x66/0x70
     ? __x64_sys_ioctl+0x16/0x20
     ? do_syscall_64+0x4f/0x100
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
    ---[ end trace 9775b14b123b1713 ]---
    
    Using JMP:
    
    ------------[ cut here ]------------
    kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
    invalid opcode: 0000 [#1] SMP
    CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
    RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
    Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
    RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
    R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
    FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
    Call Trace:
     vmx_vcpu_run+0x156/0x630 [kvm_intel]
     ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
     ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
     ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
     ? __set_task_blocked+0x38/0x90
     ? __set_current_blocked+0x50/0x60
     ? __fpu__restore_sig+0x97/0x490
     ? do_vfs_ioctl+0xa1/0x620
     ? __x64_sys_futex+0x89/0x180
     ? ksys_ioctl+0x66/0x70
     ? __x64_sys_ioctl+0x16/0x20
     ? do_syscall_64+0x4f/0x100
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
    ---[ end trace f9daedb85ab3ddba ]---
    
    Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    a90e0979
kvm_host.h 41.1 KB