- 23 Nov, 2019 3 commits
-
-
Sean Christopherson authored
Fold shared_msr_update() into its sole user to eliminate its pointless bounds check, its godawful printk, its misleading comment (it's called under a global lock), and its woefully inaccurate name. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
The jump label out_free_1 and out_free_2 deal with the same stuff, so git rid of one and rename the label out_free_0a to retain the label name order. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
A recent change inadvertently exported a static function, which results in modpost throwing a warning. Fix it. Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 21 Nov, 2019 11 commits
-
-
Paolo Bonzini authored
Preparatory work for shattering mmu.c into multiple files. Besides making it easier to follow, this will also make it possible to write unit tests for various parts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use of the INVVPID/INVEPT Instruction, the hypervisor needs to execute INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and either: "Virtualize APIC accesses" VM-execution control was changed from 0 to 1, OR the value of apic_access_page was changed. In the nested case, the burden falls on L1, unless L0 enables EPT in vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason prepare_vmcs02() and load_vmcs12_host_state() have special code to request a TLB flush in case L1 does not use EPT but it uses "virtualize APIC accesses". This special case however is not necessary. On a nested vmentry the physical TLB will already be flushed except if all the following apply: * L0 uses VPID * L1 uses VPID * L0 can guarantee TLB entries populated while running L1 are tagged differently than TLB entries populated while running L2. If the first condition is false, the processor will flush the TLB on vmentry to L2. If the second or third condition are false, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even if both are true, no extra TLB flush is needed to handle the APIC access page: * if L1 doesn't use VPID, the second condition doesn't hold and the TLB will be flushed anyway. * if L1 uses VPID, it has to flush the TLB itself with INVVPID and section 28.3.3.3 doesn't apply to L0. * even INVEPT is not needed because, if L0 uses EPT, it uses different EPTP when running L2 than L1 (because guest_mode is part of mmu-role). In this case SDM section 28.3.3.4 doesn't apply. Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(), one could note that L0 won't flush TLB only in cases where SDM sections 28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section 28.3.3.3 doesn't apply. Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit(). Side-note: This patch can be viewed as removing parts of commit fb6c8198 ("kvm: vmx: Flush TLB when the APIC-access address changes”) that is not relevant anymore since commit 1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”). i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT, then L0 will use same EPTP for both L0 and L1. Which indeed required L0 to execute INVEPT before entering L2 guest. This assumption is not true anymore since when guest_mode was added to mmu-role. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Mao Wenan authored
Fixes gcc '-Wunused-but-set-variable' warning: arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask: arch/x86/kvm/x86.c:7911:7: warning: variable called set but not used [-Wunused-but-set-variable] It is not used since commit 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Mao Wenan <maowenan@huawei.com> Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
vmcs->apic_access_page is simply a token that the hypervisor puts into the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers APIC-access VMExit or APIC virtualization logic whenever a CPU running in VMX non-root mode read/write from/to this PFN. As every write either triggers an APIC-access VMExit or write is performed on vmcs->virtual_apic_page, the PFN pointed to by vmcs->apic_access_page should never actually be touched by CPU. Therefore, there is no need to mark vmcs02->apic_access_page as dirty after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Conflicts: arch/x86/kvm/vmx/vmx.c
-
Paolo Bonzini authored
If X86_FEATURE_RTM is disabled, the guest should not be able to access MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all transactions from the guest to abort. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The current guest mitigation of TAA is both too heavy and not really sufficient. It is too heavy because it will cause some affected CPUs (those that have MDS_NO but lack TAA_NO) to fall back to VERW and get the corresponding slowdown. It is not really sufficient because it will cause the MDS_NO bit to disappear upon microcode update, so that VMs started before the microcode update will not be runnable anymore afterwards, even with tsx=on. Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for the guest and let it run without the VERW mitigation. Even though MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write it on every vmentry, we can use the shared MSR functionality because the host kernel need not protect itself from TSX-based side-channels. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Because KVM always emulates CPUID, the CPUID clear bit (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually" by the hypervisor when performing said emulation. Right now neither kvm-intel.ko nor kvm-amd.ko implement MSR_IA32_TSX_CTRL but this will change in the next patch. Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
"Shared MSRs" are guest MSRs that are written to the host MSRs but keep their value until the next return to userspace. They support a mask, so that some bits keep the host value, but this mask is only used to skip an unnecessary MSR write and the value written to the MSR is always the guest MSR. Fix this and, while at it, do not update smsr->values[slot].curr if for whatever reason the wrmsr fails. This should only happen due to reserved bits, so the value written to smsr->values[slot].curr will not match when the user-return notifier and the host value will always be restored. However, it is untidy and in rare cases this can actually avoid spurious WRMSRs on return to userspace. Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR && !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not hidden (it actually was), yet the value says that TSX is not vulnerable to microarchitectural data sampling. Fix both. Cc: stable@vger.kernel.org Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmPaolo Bonzini authored
KVM/arm updates for Linux 5.5: - Allow non-ISV data aborts to be reported to userspace - Allow injection of data aborts from userspace - Expose stolen time to guests - GICv4 performance improvements - vgic ITS emulation fixes - Simplify FWB handling - Enable halt pool counters - Make the emulated timer PREEMPT_RT compliant Conflicts: include/uapi/linux/kvm.h
-
- 20 Nov, 2019 5 commits
-
-
Liran Alon authored
Since commit 1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"), guest_mode was added to mmu-role and therefore if L0 use EPT, it will always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02. Because TLB entries are tagged with EP4TA, KVM can assume TLB entries populated while running L2 are tagged differently than TLB entries populated while running L1. Therefore, update nested_has_guest_tlb_tag() to consider if L0 use EPT instead of if L1 use EPT. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
The function is only used in kvm.ko module. Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Chenyi Qiang authored
When L1 guest uses 5-level paging, it fails vm-entry to L2 due to invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1 MSR. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nitesh Narayan Lal authored
Not zeroing the bitmap used for identifying the destination vCPUs for an IOAPIC scan request in fixed delivery mode could lead to waking up unwanted vCPUs. This patch zeroes the vCPU bitmap before passing it to kvm_bitmap_or_dest_vcpus(), which is responsible for setting the bitmap with the bits corresponding to the destination vCPUs. Fixes: 7ee30bc1("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 18 Nov, 2019 1 commit
-
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: small fixes and enhancements - selftest improvements - yield improvements - cleanups
-
- 15 Nov, 2019 20 commits
-
-
Nitesh Narayan Lal authored
In IOAPIC fixed delivery mode instead of flushing the scan requests to all vCPUs, we should only send the requests to vCPUs specified within the destination field. This patch introduces kvm_get_dest_vcpus_mask() API which retrieves an array of target vCPUs by using kvm_apic_map_get_dest_lapic() and then based on the vcpus_idx, it sets the bit in a bitmap. However, if the above fails kvm_get_dest_vcpus_mask() finds the target vCPUs by traversing all available vCPUs. Followed by setting the bits in the bitmap. If we had different vCPUs in the previous request for the same redirection table entry then bits corresponding to these vCPUs are also set. This to done to keep ioapic_handled_vectors synchronized. This bitmap is then eventually passed on to kvm_make_vcpus_request_mask() to generate a masked request only for the target vCPUs. This would enable us to reduce the latency overhead on isolated vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN. Suggested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
Fetching an index for any vcpu in kvm->vcpus array by traversing the entire array everytime is costly. This patch remembers the position of each vcpu in kvm->vcpus array by storing it in vcpus_idx under kvm_vcpu structure. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
The L1 hypervisor may include the IA32_TIME_STAMP_COUNTER MSR in the vmcs12 MSR VM-exit MSR-store area as a way of determining the highest TSC value that might have been observed by L2 prior to VM-exit. The current implementation does not capture a very tight bound on this value. To tighten the bound, add the IA32_TIME_STAMP_COUNTER MSR to the vmcs02 VM-exit MSR-store area whenever it appears in the vmcs12 VM-exit MSR-store area. When L0 processes the vmcs12 VM-exit MSR-store area during the emulation of an L2->L1 VM-exit, special-case the IA32_TIME_STAMP_COUNTER MSR, using the value stored in the vmcs02 VM-exit MSR-store area to derive the value to be stored in the vmcs12 VM-exit MSR-store area. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
Rename function find_msr() to vmx_find_msr_index() in preparation for an upcoming patch where we export it and use it in nested.c. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS. This needs to be done due to the addition of the MSR-autostore area that will be added in a future patch. After that the name AUTOLOAD will no longer make sense. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
Add the function read_and_check_msr_entry() which just pulls some code out of nested_vmx_store_msr(). This will be useful as reusable code in upcoming patches. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Correct a small inaccuracy in the shattering of vmx.c, which becomes visible now that pmu_intel.c includes nested.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
The "load IA32_PERF_GLOBAL_CTRL" bit for VM-entry and VM-exit should only be exposed to the guest if IA32_PERF_GLOBAL_CTRL is a valid MSR. Create a new helper to allow pmu_refresh() to update the VM-Entry and VM-Exit controls to ensure PMU values are initialized when performing the is_valid_msr() check. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
Add condition to prepare_vmcs02 which loads IA32_PERF_GLOBAL_CTRL on VM-entry if the "load IA32_PERF_GLOBAL_CTRL" bit on the VM-entry control is set. Use SET_MSR_OR_WARN() rather than directly writing to the field to avoid overwrite by atomic_switch_perf_msrs(). Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
The existing implementation for loading the IA32_PERF_GLOBAL_CTRL MSR on VM-exit was incorrect, as the next call to atomic_switch_perf_msrs() could cause this value to be overwritten. Instead, call kvm_set_msr() which will allow atomic_switch_perf_msrs() to correctly set the values. Define a macro, SET_MSR_OR_WARN(), to set the MSR with kvm_set_msr() and WARN on failure. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
Add a consistency check on nested vm-entry for host's IA32_PERF_GLOBAL_CTRL from vmcs12. Per Intel's SDM Vol 3 26.2.2: If the "load IA32_PERF_GLOBAL_CTRL" VM-exit control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register" Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
Add condition to nested_vmx_check_guest_state() to check the validity of GUEST_IA32_PERF_GLOBAL_CTRL. Per Intel's SDM Vol 3 26.3.1.1: If the "load IA32_PERF_GLOBAL_CTRL" VM-entry control is 1, bits reserved in the IA32_PERF_GLOBAL_CTRL MSR must be 0 in the field for that register. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
Create a helper function to check the validity of a proposed value for IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr(). Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be cleared for the corresponding host/guest state fields. Suggested-by: Jim Mattson <jmattson@google.com> Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wainer dos Santos Moschetta authored
On kvm_create_max_vcpus test remove unneeded local variable in the loop that add vcpus to the VM. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
When KVM emulates a nested VMEntry (L1->L2 VMEntry), it switches mmu root page. If nEPT is used, this will happen from kvm_init_shadow_ept_mmu()->__kvm_mmu_new_cr3() and otherwise it will happpen from nested_vmx_load_cr3()->kvm_mmu_new_cr3(). Either case, __kvm_mmu_new_cr3() will use fast_cr3_switch() in attempt to switch to a previously cached root page. In case fast_cr3_switch() finds a matching cached root page, it will set it in mmu->root_hpa and request KVM_REQ_LOAD_CR3 such that on next entry to guest, KVM will set root HPA in appropriate hardware fields (e.g. vmcs->eptp). In addition, fast_cr3_switch() calls kvm_x86_ops->tlb_flush() in order to flush TLB as MMU root page was replaced. This works as mmu->root_hpa, which vmx_flush_tlb() use, was already replaced in cached_root_available(). However, this may result in unnecessary INVEPT execution because a KVM_REQ_TLB_FLUSH may have already been requested. For example, by prepare_vmcs02() in case L1 don't use VPID. Therefore, change fast_cr3_switch() to just request TLB flush on next entry to guest. Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
Currently, a host perf_event is created for a vPMC functionality emulation. It’s unpredictable to determine if a disabled perf_event will be reused. If they are disabled and are not reused for a considerable period of time, those obsolete perf_events would increase host context switch overhead that could have been avoided. If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu sched time slice, and its independent enable bit of the vPMC isn't set, we can predict that the guest has finished the use of this vPMC, and then do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in. This lazy mechanism delays the event release time to the beginning of the next scheduled time slice if vPMC's MSRs aren't changed during this time slice. If guest comes back to use this vPMC in next time slice, a new perf event would be re-created via perf_event_create_kernel_counter() as usual. Suggested-by: Wei Wang <wei.w.wang@intel.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is a heavyweight and high-frequency operation, especially when host disables the watchdog (maximum 21000000 ns) which leads to an unacceptable latency of the guest NMI handler. It limits the use of vPMUs in the guest. When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop and release its existing perf_event (if any) every time EVEN in most cases almost the same requested perf_event will be created and configured again. For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl' for fixed) is the same as its current config AND a new sample period based on pmc->counter is accepted by host perf interface, the current event could be reused safely as a new created one does. Otherwise, do release the undesirable perf_event and reprogram a new one as usual. It's light-weight to call pmc_pause_counter (disable, read and reset event) and pmc_resume_counter (recalibrate period and re-enable event) as guest expects instead of release-and-create again on any condition. Compared to use the filterable event->attr or hw.config, a new 'u64 current_config' field is added to save the last original programed config for each vPMC. Based on this implementation, the number of calls to pmc_reprogram_counter is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event. In the usage of multiplexing perf sampling mode, the average latency of the guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up). If host disables watchdog, the minimum latecy of guest NMI handler could be speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average. Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*, and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) || .is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so this function actually receives the contents of ECX before RDPMC, and translates it to a kvm_pmc. Let's clarify its semantic by renaming the existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to is_valid_rdpmc_ecx; likewise for the wrapper kvm_pmu_is_valid_msr_idx. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
Exporting perf_event_pause() as an external accessor for kernel users (such as KVM) who may do both disable perf_event and read count with just one time to hold perf_event_ctx_lock. Also the value could be reset optionally. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-