1. 15 Nov, 2019 7 commits
    • Liran Alon's avatar
      KVM: VMX: Consume pending LAPIC INIT event when exit on INIT_SIGNAL · e64a8508
      Liran Alon authored
      Intel SDM section 25.2 OTHER CAUSES OF VM EXITS specifies the following
      on INIT signals: "Such exits do not modify register state or clear pending
      events as they would outside of VMX operation."
      
      When commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      was applied, I interepted above Intel SDM statement such that
      INIT_SIGNAL exit don’t consume the LAPIC INIT pending event.
      
      However, when Nadav Amit run matching kvm-unit-test on a bare-metal
      machine, it turned out my interpetation was wrong. i.e. INIT_SIGNAL
      exit does consume the LAPIC INIT pending event.
      (See: https://www.spinics.net/lists/kvm/msg196757.html)
      
      Therefore, fix KVM code to behave as observed on bare-metal.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e64a8508
    • Liran Alon's avatar
      KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latched · 27cbe7d6
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_MP_STATE allows userspace to put vCPU
      into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when
      vCPU is in VMX operation.
      
      Fix this by introducing a util method to check if vCPU state latch INIT
      signals and use it in KVM_SET_MP_STATE handler.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reported-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27cbe7d6
    • Liran Alon's avatar
      KVM: x86: Evaluate latched_init in KVM_SET_VCPU_EVENTS when vCPU not in SMM · ff90afa7
      Liran Alon authored
      Commit 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX
      operation.
      
      However, current API of KVM_SET_VCPU_EVENTS defines this field as
      part of SMM state and only set pending LAPIC INIT event if vCPU is
      specified to be in SMM mode (events->smi.smm is set).
      
      Change KVM_SET_VCPU_EVENTS handler to set pending LAPIC INIT event
      by latched_init field regardless of if vCPU is in SMM mode or not.
      
      Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff90afa7
    • Andrea Arcangeli's avatar
      x86: retpolines: eliminate retpoline from msr event handlers · 74c504a6
      Andrea Arcangeli authored
      It's enough to check the value and issue the direct call.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a VMX host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 267
      @[]: 2256
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          __kvm_wait_lapic_expire+284
          vmx_vcpu_run.part.97+1091
          vcpu_enter_guest+377
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 2390
      @[]: 33410
      
      @total: 315707
      
      Note the highest hit above is __delay so probably not worth optimizing
      even if it would be more frequent than 2k hits per sec.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74c504a6
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from svm.c exit handlers · 3dcb2a3f
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a SVM host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get_update_offsets_now+70
          hrtimer_interrupt+131
          smp_apic_timer_interrupt+106
          apic_timer_interrupt+15
          start_sw_timer+359
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 1940
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_r12+33
          force_qs_rnp+217
          rcu_gp_kthread+1270
          kthread+268
          ret_from_fork+34
      ]: 4644
      @[]: 25095
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41474
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_start_range_ns+528
          start_sw_timer+356
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 41887
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          lapic_next_event+28
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42723
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          clockevents_program_event+148
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42766
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          clockevents_program_event+84
          hrtimer_try_to_cancel+168
          hrtimer_cancel+21
          kvm_set_lapic_tscdeadline_msr+43
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 42848
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          ktime_get+58
          start_sw_timer+279
          restart_apic_timer+85
          kvm_set_msr_common+1497
          msr_interception+142
          vcpu_enter_guest+684
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 499845
      
      @total: 1780243
      
      SVM has no TSC based programmable preemption timer so it is invoking
      ktime_get() frequently.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3dcb2a3f
    • Andrea Arcangeli's avatar
      KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers · 4289d272
      Andrea Arcangeli authored
      It's enough to check the exit value and issue a direct call to avoid
      the retpoline for all the common vmexit reasons.
      
      Of course CONFIG_RETPOLINE already forbids gcc to use indirect jumps
      while compiling all switch() statements, however switch() would still
      allow the compiler to bisect the case value. It's more efficient to
      prioritize the most frequent vmexits instead.
      
      The halt may be slow paths from the point of the guest, but not
      necessarily so from the point of the host if the host runs at full CPU
      capacity and no host CPU is ever left idle.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4289d272
    • Andrea Arcangeli's avatar
      KVM: x86: optimize more exit handlers in vmx.c · f399e60c
      Andrea Arcangeli authored
      Eliminate wasteful call/ret non RETPOLINE case and unnecessary fentry
      dynamic tracing hooking points.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f399e60c
  2. 11 Nov, 2019 1 commit
  3. 02 Nov, 2019 1 commit
    • Marcelo Tosatti's avatar
      KVM: x86: switch KVMCLOCK base to monotonic raw clock · 53fafdbb
      Marcelo Tosatti authored
      Commit 0bc48bea ("KVM: x86: update master clock before computing
      kvmclock_offset")
      switches the order of operations to avoid the conversion
      
      TSC (without frequency correction) ->
      system_timestamp (with frequency correction),
      
      which might cause a time jump.
      
      However, it leaves any other masterclock update unsafe, which includes,
      at the moment:
      
              * HV_X64_MSR_REFERENCE_TSC MSR write.
              * TSC writes.
              * Host suspend/resume.
      
      Avoid the time jump issue by using frequency uncorrected
      CLOCK_MONOTONIC_RAW clock.
      
      Its the guests time keeping software responsability
      to track and correct a reference clock such as UTC.
      
      This fixes forward time jump (which can result in
      failure to bring up a vCPU) during vCPU hotplug:
      
      Oct 11 14:48:33 storage kernel: CPU2 has been hot-added
      Oct 11 14:48:34 storage kernel: CPU3 has been hot-added
      Oct 11 14:49:22 storage kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2          <-- time jump of almost 1 minute
      Oct 11 14:49:22 storage kernel: smpboot: do_boot_cpu failed(-1) to wakeup CPU#2
      Oct 11 14:49:23 storage kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3
      Oct 11 14:49:23 storage kernel: kvm-clock: cpu 3, msr 0:7ff640c1, secondary cpu clock
      
      Which happens because:
      
                      /*
                       * Wait 10s total for a response from AP
                       */
                      boot_error = -1;
                      timeout = jiffies + 10*HZ;
                      while (time_before(jiffies, timeout)) {
                               ...
                      }
      Analyzed-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53fafdbb
  4. 31 Oct, 2019 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-next-5.5-1' of... · e7011c5d
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-next-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      KVM PPC update for 5.5
      
      * Add capability to tell userspace whether we can single-step the guest.
      
      * Improve the allocation of XIVE virtual processor IDs, to reduce the
        risk of running out of IDs when running many VMs on POWER9.
      
      * Rewrite interrupt synthesis code to deliver interrupts in virtual
        mode when appropriate.
      
      * Minor cleanups and improvements.
      e7011c5d
  5. 25 Oct, 2019 1 commit
  6. 22 Oct, 2019 29 commits