• Tomasz Grabiec's avatar
    KVM: x86: fix TSC matching · 0d3da0d2
    Tomasz Grabiec authored
    I've observed kvmclock being marked as unstable on a modern
    single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.
    
    The culprit was failure in TSC matching because of overflow of
    kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
    in a single synchronization cycle.
    
    Turns out that qemu does multiple TSC writes during init, below is the
    evidence of that (qemu-2.0.0):
    
    The first one:
    
     0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
     0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
     0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
     0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]
    
    The second one:
    
     0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
     0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
     0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
     0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
     0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
     0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
     0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
    
     #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
     #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
     #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
     #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
     #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
     #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
     #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390
    
    The third one:
    
     0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
     0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
     0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
     0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
     0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
     0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
     0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
    
     #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
     #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
     #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
     #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
     #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
     #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
     #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482
    
    The fix is to count each vCPU only once when matched, so that
    nr_vcpus_matched_tsc holds the size of the matched set. This is
    achieved by reusing generation counters. Every vCPU with
    this_tsc_generation == cur_tsc_generation is in the matched set. The
    match set is cleared by setting cur_tsc_generation to a value which no
    other vCPU is set to (by incrementing it).
    
    I needed to bump up the counter size form u8 to u64 to ensure it never
    overflows. Otherwise in cases TSC is not written the same number of
    times on each vCPU the counter could overflow and incorrectly indicate
    some vCPUs as being in the matched set. This scenario seems unlikely
    but I'm not sure if it can be disregarded.
    Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    0d3da0d2
x86.c 190 KB