1. 02 Aug, 2021 31 commits
  2. 30 Jul, 2021 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: accept userspace interrupt only if no event is injected · fa7a549d
      Paolo Bonzini authored
      Once an exception has been injected, any side effects related to
      the exception (such as setting CR2 or DR6) have been taked place.
      Therefore, once KVM sets the VM-entry interruption information
      field or the AMD EVENTINJ field, the next VM-entry must deliver that
      exception.
      
      Pending interrupts are processed after injected exceptions, so
      in theory it would not be a problem to use KVM_INTERRUPT when
      an injected exception is present.  However, DOSEMU is using
      run->ready_for_interrupt_injection to detect interrupt windows
      and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
      interrupt manually.  For this to work, the interrupt window
      must be delayed after the completion of the previous event
      injection.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarStas Sergeev <stsp2@yandex.ru>
      Tested-by: default avatarStas Sergeev <stsp2@yandex.ru>
      Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa7a549d
  3. 27 Jul, 2021 8 commits
    • Paolo Bonzini's avatar
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · 8750f9bb
      Paolo Bonzini authored
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8750f9bb
    • Li RongQing's avatar
      KVM: use cpu_relax when halt polling · 74775654
      Li RongQing authored
      SMT siblings share caches and other hardware, and busy halt polling
      will degrade its sibling performance if its sibling is working
      
      Sean Christopherson suggested as below:
      
      "Rather than disallowing halt-polling entirely, on x86 it should be
      sufficient to simply have the hardware thread yield to its sibling(s)
      via PAUSE.  It probably won't get back all performance, but I would
      expect it to be close.
      This compiles on all KVM architectures, and AFAICT the intended usage
      of cpu_relax() is identical for all architectures."
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Message-Id: <20210727111247.55510-1-lirongqing@baidu.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74775654
    • Maxim Levitsky's avatar
      KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl · 5868b822
      Maxim Levitsky authored
      Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon
      as the guest CPUID is set.
      
      AVIC happens to be fully disabled on all vCPUs by the time any guest
      entry starts (if after migration the entry can be nested).
      
      The reason is that currently we disable avic right away on vCPU from which
      the kvm_request_apicv_update was called and for this case, it happens to be
      called on all vCPUs (by svm_vcpu_after_set_cpuid).
      
      After we stop doing this, AVIC will end up being disabled only when
      KVM_REQ_APICV_UPDATE is processed which is after we done switching to the
      nested guest.
      
      Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic
      (which is a right thing to do anyway).
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5868b822
    • Maxim Levitsky's avatar
      KVM: SVM: tweak warning about enabled AVIC on nested entry · feea0136
      Maxim Levitsky authored
      It is possible that AVIC was requested to be disabled but
      not yet disabled, e.g if the nested entry is done right
      after svm_vcpu_after_set_cpuid.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      feea0136
    • Maxim Levitsky's avatar
      KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated · f1577ab2
      Maxim Levitsky authored
      It is possible for AVIC inhibit and AVIC active state to be mismatched.
      Currently we disable AVIC right away on vCPU which started the AVIC inhibit
      request thus this warning doesn't trigger but at least in theory,
      if svm_set_vintr is called at the same time on multiple vCPUs,
      the warning can happen.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f1577ab2
    • Christian Borntraeger's avatar
      KVM: s390: restore old debugfs names · bb000f64
      Christian Borntraeger authored
      commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      did replace the old definitions with the binary ones. While doing that
      it missed that some files are names different than the counters. This
      is especially important for kvm_stat which does have special handling
      for counters named instruction_*.
      
      Fixes: commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      CC: Jing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb000f64
    • Paolo Bonzini's avatar
      KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized · 3fa5e8fd
      Paolo Bonzini authored
      Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect
      dereference of vmcb->control.reserved_sw before the vmcb is checked
      for being non-NULL.  The compiler is usually sinking the dereference
      after the check; instead of doing this ourselves in the source,
      ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called
      with a non-NULL VMCB.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Vineeth Pillai <viremana@linux.microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [Untested for now due to issues with my AMD machine. - Paolo]
      3fa5e8fd
    • David Matlack's avatar
      KVM: selftests: Introduce access_tracking_perf_test · c33e05d9
      David Matlack authored
      This test measures the performance effects of KVM's access tracking.
      Access tracking is driven by the MMU notifiers test_young, clear_young,
      and clear_flush_young. These notifiers do not have a direct userspace
      API, however the clear_young notifier can be triggered by marking a
      pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages
      that mechanism to enable access tracking on guest memory.
      
      To measure performance this test runs a VM with a configurable number of
      vCPUs that each touch every page in disjoint regions of memory.
      Performance is measured in the time it takes all vCPUs to finish
      touching their predefined region.
      
      Example invocation:
      
        $ ./access_tracking_perf_test -v 8
        Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
        guest physical test memory offset: 0xffdfffff000
      
        Populating memory             : 1.337752570s
        Writing to populated memory   : 0.010177640s
        Reading from populated memory : 0.009548239s
        Mark memory idle              : 23.973131748s
        Writing to idle memory        : 0.063584496s
        Mark memory idle              : 24.924652964s
        Reading from idle memory      : 0.062042814s
      
      Breaking down the results:
      
       * "Populating memory": The time it takes for all vCPUs to perform the
         first write to every page in their region.
      
       * "Writing to populated memory" / "Reading from populated memory": The
         time it takes for all vCPUs to write and read to every page in their
         region after it has been populated. This serves as a control for the
         later results.
      
       * "Mark memory idle": The time it takes for every vCPU to mark every
         page in their region as idle through page_idle.
      
       * "Writing to idle memory" / "Reading from idle memory": The time it
         takes for all vCPUs to write and read to every page in their region
         after it has been marked idle.
      
      This test should be portable across architectures but it is only enabled
      for x86_64 since that's all I have tested.
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c33e05d9