1. 08 Jan, 2021 32 commits
  2. 07 Jan, 2021 8 commits
    • Tom Lendacky's avatar
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky authored
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      647daca2
    • Maxim Levitsky's avatar
      KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · f2c7ef3b
      Maxim Levitsky authored
      It is possible to exit the nested guest mode, entered by
      svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
      if the nested run was not pending during the migration.
      
      In this case we must not switch to the nested msr permission bitmap.
      Also add a warning to catch similar cases in the future.
      
      Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f2c7ef3b
    • Maxim Levitsky's avatar
      KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode · 56fe28de
      Maxim Levitsky authored
      We overwrite most of vmcb fields while doing so, so we must
      mark it as dirty.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      56fe28de
    • Maxim Levitsky's avatar
      KVM: nSVM: correctly restore nested_run_pending on migration · 81f76ada
      Maxim Levitsky authored
      The code to store it on the migration exists, but no code was restoring it.
      
      One of the side effects of fixing this is that L1->L2 injected events
      are no longer lost when migration happens with nested run pending.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81f76ada
    • Ben Gardon's avatar
      KVM: x86/mmu: Clarify TDP MMU page list invariants · c0dba6e4
      Ben Gardon authored
      The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
      pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
      pages with a non-zero root_count and tdp_mmu_roots should only contain
      pages with a positive root_count, unless a thread holds the MMU lock and
      is in the process of modifying the list. Various functions expect these
      invariants to be maintained, but they are not explictily documented. Add
      to the comments on both fields to document the above invariants.
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20210107001935.3732070-2-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c0dba6e4
    • Ben Gardon's avatar
      KVM: x86/mmu: Ensure TDP MMU roots are freed after yield · a889ea54
      Ben Gardon authored
      Many TDP MMU functions which need to perform some action on all TDP MMU
      roots hold a reference on that root so that they can safely drop the MMU
      lock in order to yield to other threads. However, when releasing the
      reference on the root, there is a bug: the root will not be freed even
      if its reference count (root_count) is reduced to 0.
      
      To simplify acquiring and releasing references on TDP MMU root pages, and
      to ensure that these roots are properly freed, move the get/put operations
      into another TDP MMU root iterator macro.
      
      Moving the get/put operations into an iterator macro also helps
      simplify control flow when a root does need to be freed. Note that using
      the list_for_each_entry_safe macro would not have been appropriate in
      this situation because it could keep a pointer to the next root across
      an MMU lock release + reacquire, during which time that root could be
      freed.
      Reported-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Fixes: 063afacd ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20210107001935.3732070-1-bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a889ea54
    • Lai Jiangshan's avatar
      kvm: check tlbs_dirty directly · 88bf56d0
      Lai Jiangshan authored
      In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
              need_tlb_flush |= kvm->tlbs_dirty;
      with need_tlb_flush's type being int and tlbs_dirty's type being long.
      
      It means that tlbs_dirty is always used as int and the higher 32 bits
      is useless.  We need to check tlbs_dirty in a correct way and this
      change checks it directly without propagating it to need_tlb_flush.
      
      Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
      cause problems in practice.  It would require encountering tlbs_dirty
      on a 4 billion count boundary, and KVM would need to be using shadow
      paging or be running a nested guest.
      
      Cc: stable@vger.kernel.org
      Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path")
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      88bf56d0
    • Stephen Zhang's avatar
      KVM: x86: change in pv_eoi_get_pending() to make code more readable · de7860c8
      Stephen Zhang authored
      Signed-off-by: default avatarStephen Zhang <stephenzhangzsd@gmail.com>
      Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de7860c8