- 21 Jul, 2014 3 commits
-
-
Nadav Amit authored
RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as anyhow emulating popf, just as any other instruction should clear RFLAGS.RF. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
When skipping an emulated instruction, rflags.rf should be cleared as it would be on real x86 CPU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvm-s390-20140715' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make the cpu state settable by user space. This is necessary to avoid races in s390 SIGP/reset handling which happen because some SIGPs are handled in QEMU, while others are handled in the kernel. Together with the busy conditions as return value of SIGP races happen especially in areas like starting and stopping of CPUs. (For example, there is a program 'cpuplugd', that runs on several s390 distros which does automatic onlining and offlining on cpus.) As soon as the MPSTATE interface is used, user space takes complete control of the cpu states. Otherwise the kernel will use the old way. Therefore, the new kernel continues to work fine with old QEMUs.
-
- 17 Jul, 2014 1 commit
-
-
Wanpeng Li authored
This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is some progress and the L2 can boot up, however, slowly. The original idea of this fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>. Interrupt which delivered by vid should be injected to L1 by L0 if current is in L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't have set External-interrupt exiting bit. The current logic doen't consider these cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old injection way if L1 doen't have External-interrupt exiting bit set. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 11 Jul, 2014 24 commits
-
-
Nadav Amit authored
Certain instructions (e.g., mwait and monitor) cause a #UD exception when they are executed in user mode. This is in contrast to the regular privileged instructions which cause #GP. In order not to mess with SVM interception of mwait and monitor which assumes privilege level assertions take place before interception, a flag has been added. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Certain instructions, such as monitor and xsave do not support big real mode and cause a #GP exception if any of the accessed bytes effective address are not within [0, 0xffff]. This patch introduces a flag to mark these instructions, including the necassary checks. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Emulator accesses are always done a page at a time, either by the emulator itself (for fetches) or because we need to query the MMU for address translations. Speed up these accesses by using kvm_read_guest_page and, in the case of fetches, by inlining kvm_read_guest_virt_helper and dropping the loop around kvm_read_guest_page. This final tweak saves 30-100 more clock cycles (4-10%), bringing the count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series and 925-1700 after the first two low-hanging fruit changes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
When the CS base is not page-aligned, the linear address of the code could get close to the page boundary (e.g. 0x...ffe) even if the EIP value is not. So we need to first linearize the address, and only then compute the number of valid bytes that can be fetched. This happens relatively often when executing real mode code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This simplifies the code a bit, especially the overflow checks. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
We do not need a memory copying loop anymore in insn_fetch; we can use a byte-aligned pointer to access instruction fields directly from the fetch_cache. This eliminates 50-150 cycles (corresponding to a 5-10% improvement in performance) from each instruction. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
do_insn_fetch_bytes will only be called once in a given insn_fetch and insn_fetch_arr, because in fact it will only be called at most twice for any instruction and the first call is explicit in x86_decode_insn. This observation lets us hoist the call out of the memory copying loop. It does not buy performance, because most fetches are one byte long anyway, but it prepares for the next patch. The overflow check is tricky, but correct. Because do_insn_fetch_bytes has already been called once, we know that fc->end is at least 15. So it is okay to subtract the number of bytes we want to read. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Hoist the common case up from do_insn_fetch_byte to do_insn_fetch, and prime the fetch_cache in x86_decode_insn. This helps a bit the compiler and the branch predictor, but above all it lays the ground for further changes in the next few patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
rip_relative is only set if decode_modrm runs, and if you have ModRM you will also have a memopp. We can then access memopp unconditionally. Note that rip_relative cannot be hoisted up to decode_modrm, or you break "mov $0, xyz(%rip)". Also, move typecast on "out of range value" of mem.ea to decode_modrm. Together, all these optimizations save about 50 cycles on each emulated instructions (4-6%). Signed-off-by: Bandan Das <bsd@redhat.com> [Fix immediate operands with rip-relative addressing. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
x86_decode_insn already sets a default for seg_override, so remove it from the zeroed area. Also replace set/get functions with direct access to the field. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
A lot of initializations are unnecessary as they get set to appropriate values before actually being used. Optimize placement of fields in x86_emulate_ctxt Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
Remove the if conditional - that will help us avoid an "else initialize to 0" Also, rearrange operators for slightly better code. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
The same information can be gleaned from ctxt->d and avoids having to zero/NULL initialize intercept and check_perm Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Bandan Das authored
Core emulator functions all belong in emulator.c, x86 should have no knowledge of emulator internals Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The "if/return" checks are useless, because we return X86EMUL_CONTINUE anyway if we do not return. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst. write_register_operand will take care of writing only the lower bytes. Avoiding a call to memcpy (the compiler optimizes it out) gains about 200 cycles on kvm-unit-tests for register-to-register moves, and makes them about as fast as arithmetic instructions. We could perhaps get a larger speedup by moving all instructions _except_ moves out of x86_emulate_insn, removing opcode_len, and replacing the switch statement with an inlined em_mov. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
There are several checks for "peculiar" aspects of instructions in both x86_decode_insn and x86_emulate_insn. Group them together, and guard them with a single "if" that lets the processor quickly skip them all. Make this more effective by adding two more flag bits that say whether the .intercept and .check_perm fields are valid. We will reuse these flags later to avoid initializing fields of the emulate_ctxt struct. This skims about 30 cycles for each emulated instructions, which is approximately a 3% improvement. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The only purpose of this patch is to make the next patch simpler to review. No semantic change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Despite the provisions to emulate up to 130 consecutive instructions, in practice KVM will emulate just one before exiting handle_invalid_guest_state, because x86_emulate_instruction always sets KVM_REQ_EVENT. However, we only need to do this if an interrupt could be injected, which happens a) if an interrupt shadow bit (STI or MOV SS) has gone away; b) if the interrupt flag has just been set (other instructions than STI can set it without enabling an interrupt shadow). This cuts another 700-900 cycles from the cost of emulating an instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles before the patch on kvm-unit-tests, 925-1700 afterwards). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
For the next patch we will need to know the full state of the interrupt shadow; we will then set KVM_REQ_EVENT when one bit is cleared. However, right now get_interrupt_shadow only returns the one corresponding to the emulated instruction, or an unconditional 0 if the emulated instruction does not have an interrupt shadow. This is confusing and does not allow us to check for cleared bits as mentioned above. Clean the callback up, and modify toggle_interruptibility to match the comment above the call. As a small result, the call to set_interrupt_shadow will be skipped in the common case where int_shadow == 0 && mask == 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
About 25% of the time spent in emulation of invalid guest state is wasted in checking whether emulation is required for the next instruction. However, this almost never changes except when a segment register (or TR or LDTR) changes, or when there is a mode transition (i.e. CR0 changes). In fact, vmx_set_segment and vmx_set_cr0 already modify vmx->emulation_required (except that the former for some reason uses |= instead of just an assignment). So there is no need to call guest_state_valid in the emulation loop. Emulation performance test results indicate 1650-2600 cycles for common instructions, versus 2300-3200 before this patch on a Sandy Bridge Xeon. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Matthias Lange authored
Since commit 575203 the MCE subsystem in the Linux kernel for AMD sets bit 18 in MSR_K7_HWCR. Running such a kernel as a guest in KVM on an AMD host results in a GPE injected into the guest because kvm_set_msr_common returns 1. This patch fixes this by masking bit 18 from the MSR value desired by the guest. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
We encountered a scenario in which after an INIT is delivered, a pending interrupt is delivered, although it was sent before the INIT. As the SDM states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as KVM does. This also means that pending interrupts should be cleared. This patch clears upon reset (and INIT) the pending interrupts; and at the same occassion clears the pending exceptions, since they may cause a similar issue. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
We have noticed that qemu-kvm hangs early in the BIOS when runnning nested under some versions of VMware ESXi. The problem we believe is because KVM assumes that the platform preserves the 'G' but for any segment register. The SVM specification itemizes the segment attribute bits that are observed by the CPU, but the (G)ranularity bit is not one of the bits itemized, for any segment. Though current AMD CPUs keep track of the (G)ranularity bit for all segment registers other than CS, the specification does not require it. VMware's virtual CPU may not track the (G)ranularity bit for any segment register. Since kvm already synthesizes the (G)ranularity bit for the CS segment. It should do so for all segments. The patch below does that, and helps get rid of the hangs. Patch applies on top of Linus' tree. Signed-off-by: Jim Mattson <jmattson@vmware.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Jul, 2014 5 commits
-
-
David Hildenbrand authored
This patch - adds s390 specific MP states to linux headers and documents them - implements the KVM_{SET,GET}_MP_STATE ioctls - enables KVM_CAP_MP_STATE - allows user space to control the VCPU state on s390. If user space sets the VCPU state using the ioctl KVM_SET_MP_STATE, we can disable manual changing of the VCPU state and trust user space to do the right thing. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
Highlight the aspects of the ioctls that are actually specific to x86 and ia64. As defined restrictions (irqchip) and mp states may not apply to other architectures, these parts are flagged to belong to x86 and ia64. In preparation for the use of KVM_(S|G)ET_MP_STATE by s390. Fix a spelling error (KVM_SET_MP_STATE vs. KVM_SET_MPSTATE) on the way. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
The function "__cpu_is_stopped" is not used any more. Let's remove it and expose the function "is_vcpu_stopped" instead, which is actually what we want. This patch also converts an open coded check for CPUSTAT_STOPPED to is_vcpu_stopped(). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
Let's move the finalization of SIGP STOP and SIGP STOP AND STORE STATUS orders to the point where the VCPU is actually stopped. This change is needed to prepare for a user space driven VCPU state change. The action_bits may only be cleared when setting the cpu state to STOPPED while holding the local irq lock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
A SIGP STOP (AND STORE STATUS) order is complete as soon as the VCPU has been stopped. This patch makes sure that only one SIGP STOP (AND STORE STATUS) may be pending at a time (as defined by the architecture). If the action_bits are still set, a SIGP STOP has been issued but not completed yet. The VCPU is busy for further SIGP STOP orders. Also set the CPUSTAT_STOP_INT after the action_bits variable has been modified (the same order that is used when injecting a KVM_S390_SIGP_STOP from userspace). Both changes are needed in preparation for a user space driven VCPU state change (to avoid race conditions). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 09 Jul, 2014 7 commits
-
-
James Hogan authored
Document the MIPS specific parts of the KVM API, including: - The layout of the kvm_regs structure. - The interrupt number passed to KVM_INTERRUPT. - The registers supported by the KVM_{GET,SET}_ONE_REG interface, and the encoding of those register ids. - That KVM_INTERRUPT and KVM_GET_REG_LIST are supported on MIPS. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
James Hogan authored
Some of the MIPS registers that can be accessed with the KVM_{GET,SET}_ONE_REG interface have fairly long names, so widen the Register column of the table in the KVM_SET_ONE_REG documentation to allow them to fit. Tabs in the table are replaced with spaces at the same time for consistency. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
James Hogan authored
KVM_SET_SIGNAL_MASK is implemented in generic code and isn't x86 specific, so document it as being applicable for all architectures. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
In two cases lapic.c does not use the apic_debug macro correctly. This patch fixes them. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tomasz Grabiec authored
I've observed kvmclock being marked as unstable on a modern single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0. The culprit was failure in TSC matching because of overflow of kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes in a single synchronization cycle. Turns out that qemu does multiple TSC writes during init, below is the evidence of that (qemu-2.0.0): The first one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm] 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm] The second one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641 #4 cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330 #5 cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390 The third one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635 #4 cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323 #5 cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482 The fix is to count each vCPU only once when matched, so that nr_vcpus_matched_tsc holds the size of the matched set. This is achieved by reusing generation counters. Every vCPU with this_tsc_generation == cur_tsc_generation is in the matched set. The match set is cleared by setting cur_tsc_generation to a value which no other vCPU is set to (by incrementing it). I needed to bump up the counter size form u8 to u64 to ensure it never overflows. Otherwise in cases TSC is not written the same number of times on each vCPU the counter could overflow and incorrectly indicate some vCPUs as being in the matched set. This scenario seems unlikely but I'm not sure if it can be disregarded. Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
Obtaining the port number from DX is bogus as a) there are immediate port accesses and b) user space may have changed the register content while processing the PIO access. Forward the correct value from the instruction emulator instead. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
The access size of an in/ins is reported in dst_bytes, and that of out/outs in src_bytes. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-