1. 18 Oct, 2021 6 commits
    • Paolo Bonzini's avatar
      kvm: x86: protect masterclock with a seqcount · 869b4421
      Paolo Bonzini authored
      Protect the reference point for kvmclock with a seqcount, so that
      kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
      updates will also run in parallel and not bounce the kvmclock cacheline.
      
      Of the variables that were protected by pvclock_gtod_sync_lock,
      nr_vcpus_matched_tsc is different because it is updated outside
      pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
      need to keep it protected by a spinlock.  In fact it must now
      be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
      write-side of a seqcount, is non-preemptible.  Since we already
      have tsc_write_lock which is a raw spinlock, we can just use
      tsc_write_lock as the lock that protects the write-side of the
      seqcount.
      Co-developed-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-6-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      869b4421
    • Oliver Upton's avatar
      KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5
      Oliver Upton authored
      Handling the migration of TSCs correctly is difficult, in part because
      Linux does not provide userspace with the ability to retrieve a (TSC,
      realtime) clock pair for a single instant in time. In lieu of a more
      convenient facility, KVM can report similar information in the kvm_clock
      structure.
      
      Provide userspace with a host TSC & realtime pair iff the realtime clock
      is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
      realtime value, advance the KVM clock by the amount of elapsed time. Do
      not step the KVM clock backwards, though, as it is a monotonic
      oscillator.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-5-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c68dc1b5
    • Paolo Bonzini's avatar
      KVM: x86: avoid warning with -Wbitwise-instead-of-logical · 3d5e7a28
      Paolo Bonzini authored
      This is a new warning in clang top-of-tree (will be clang 14):
      
      In file included from arch/x86/kvm/mmu/mmu.c:27:
      arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
              return __is_bad_mt_xwr(rsvd_check, spte) |
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                       ||
      arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning
      
      The code is fine, but change it anyway to shut up this clever clogs
      of a compiler.
      
      Reported-by: torvic9@mailbox.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d5e7a28
    • Paolo Bonzini's avatar
      a25c78d0
    • Paolo Bonzini's avatar
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini authored
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
    • Andrei Vagin's avatar
      KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned · a7cc099f
      Andrei Vagin authored
      This looks like a typo in 8f32d5e5. This change didn't intend to do
      any functional changes.
      
      The problem was caught by gVisor tests.
      
      Fixes: 8f32d5e5 ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Message-Id: <20211015163221.472508-1-avagin@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a7cc099f
  2. 15 Oct, 2021 2 commits
  3. 05 Oct, 2021 4 commits
  4. 04 Oct, 2021 18 commits
  5. 01 Oct, 2021 10 commits
    • David Stevens's avatar
      KVM: x86: only allocate gfn_track when necessary · deae4a10
      David Stevens authored
      Avoid allocating the gfn_track arrays if nothing needs them. If there
      are no external to KVM users of the API (i.e. no GVT-g), then page
      tracking is only needed for shadow page tables. This means that when tdp
      is enabled and there are no external users, then the gfn_track arrays
      can be lazily allocated when the shadow MMU is actually used. This avoid
      allocations equal to .05% of guest memory when nested virtualization is
      not used, if the kernel is compiled without GVT-g.
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-3-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      deae4a10
    • David Stevens's avatar
      KVM: x86: add config for non-kvm users of page tracking · e9d0c0c4
      David Stevens authored
      Add a config option that allows kvm to determine whether or not there
      are any external users of page tracking.
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-2-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e9d0c0c4
    • Krish Sadhukhan's avatar
      nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB · 174a921b
      Krish Sadhukhan authored
      According to section "TLB Flush" in APM vol 2,
      
          "Support for TLB_CONTROL commands other than the first two, is
           optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].
      
           All encodings of TLB_CONTROL not defined in the APM are reserved."
      Signed-off-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      174a921b
    • Juergen Gross's avatar
      kvm: use kvfree() in kvm_arch_free_vm() · 78b497f2
      Juergen Gross authored
      By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
      use the common variant. This can be accomplished by adding another
      macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.
      
      Further simplification can be achieved by adding __kvm_arch_free_vm()
      doing the common part.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210903130808.30142-5-jgross@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      78b497f2
    • Babu Moger's avatar
      KVM: x86: Expose Predictive Store Forwarding Disable · b73a5432
      Babu Moger authored
      Predictive Store Forwarding: AMD Zen3 processors feature a new
      technology called Predictive Store Forwarding (PSF).
      
      PSF is a hardware-based micro-architectural optimization designed
      to improve the performance of code execution by predicting address
      dependencies between loads and stores.
      
      How PSF works:
      
      It is very common for a CPU to execute a load instruction to an address
      that was recently written by a store. Modern CPUs implement a technique
      known as Store-To-Load-Forwarding (STLF) to improve performance in such
      cases. With STLF, data from the store is forwarded directly to the load
      without having to wait for it to be written to memory. In a typical CPU,
      STLF occurs after the address of both the load and store are calculated
      and determined to match.
      
      PSF expands on this by speculating on the relationship between loads and
      stores without waiting for the address calculation to complete. With PSF,
      the CPU learns over time the relationship between loads and stores. If
      STLF typically occurs between a particular store and load, the CPU will
      remember this.
      
      In typical code, PSF provides a performance benefit by speculating on
      the load result and allowing later instructions to begin execution
      sooner than they otherwise would be able to.
      
      The details of security analysis of AMD predictive store forwarding is
      documented here.
      https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
      
      Predictive Store Forwarding controls:
      There are two hardware control bits which influence the PSF feature:
      - MSR 48h bit 2 – Speculative Store Bypass (SSBD)
      - MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)
      
      The PSF feature is disabled if either of these bits are set.  These bits
      are controllable on a per-thread basis in an SMT system. By default, both
      SSBD and PSFD are 0 meaning that the speculation features are enabled.
      
      While the SSBD bit disables PSF and speculative store bypass, PSFD only
      disables PSF.
      
      PSFD may be desirable for software which is concerned with the
      speculative behavior of PSF but desires a smaller performance impact than
      setting SSBD.
      
      Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
      All processors that support PSF will also support PSFD.
      
      Linux kernel does not have the interface to enable/disable PSFD yet. Plan
      here is to expose the PSFD technology to KVM so that the guest kernel can
      make use of it if they wish to.
      Signed-off-by: default avatarBabu Moger <Babu.Moger@amd.com>
      Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
      [Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b73a5432
    • David Matlack's avatar
      KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858
      David Matlack authored
      mmu_try_to_unsync_pages checks if page tracking is active for the given
      gfn, which requires knowing the memslot. We can pass down the memslot
      via make_spte to avoid this lookup.
      
      The memslot is also handy for make_spte's marking of the gfn as dirty:
      we can test whether dirty page tracking is enabled, and if so ensure that
      pages are mapped as writable with 4K granularity.  Apart from the warning,
      no functional change is intended.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53597858
    • David Matlack's avatar
      KVM: x86/mmu: Avoid memslot lookup in rmap_add · 8a9f566a
      David Matlack authored
      Avoid the memslot lookup in rmap_add, by passing it down from the fault
      handling code to mmu_set_spte and then to rmap_add.
      
      No functional change intended.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210813203504.2742757-6-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8a9f566a
    • Paolo Bonzini's avatar
      KVM: MMU: pass struct kvm_page_fault to mmu_set_spte · a12f4381
      Paolo Bonzini authored
      mmu_set_spte is called for either PTE prefetching or page faults.  The
      three boolean arguments write_fault, speculative and host_writable are
      always respectively false/true/true for prefetching and coming from
      a struct kvm_page_fault for page faults.
      
      Let mmu_set_spte distinguish these two situation by accepting a
      possibly NULL struct kvm_page_fault argument.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a12f4381
    • Paolo Bonzini's avatar
      KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4
      Paolo Bonzini authored
      The level and A/D bit support of the new SPTE can be found in the role,
      which is stored in the kvm_mmu_page struct.  This merges two arguments
      into one.
      
      For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
      not use it if the SPTE is already present) so we fetch it just before
      calling make_spte.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7158bee4
    • Paolo Bonzini's avatar
      KVM: MMU: set ad_disabled in TDP MMU role · 87e888ea
      Paolo Bonzini authored
      Prepare for removing the ad_disabled argument of make_spte; instead it can
      be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
      must set the role accurately.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      87e888ea