• Peter Xu's avatar
    KVM: selftests: Sync data verify of dirty logging with guest sync · 016ff1a4
    Peter Xu authored
    This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
    when the testing host is very busy.
    
    A similar previous attempt is done [1] but that is not enough, the reason is
    stated in the reply [2].
    
    As a summary (partly quotting from [2]):
    
    The problem is I think one guest memory write operation (of this specific test)
    contains a few micro-steps when page is during kvm dirty tracking (here I'm
    only considering write-protect rather than pml but pml should be similar at
    least when the log buffer is full):
    
      (1) Guest read 'iteration' number into register, prepare to write, page fault
      (2) Set dirty bit in either dirty bitmap or dirty ring
      (3) Return to guest, data written
    
    When we verify the data, we assumed that all these steps are "atomic", say,
    when (1) happened for this page, we assume (2) & (3) must have happened.  We
    had some trick to workaround "un-atomicity" of above three steps, as previous
    version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
    letting the main thread wait for at least one vmenter of vcpu thread, which
    should work.  However what I overlooked is probably that we still have race
    when (1) and (2) can be interrupted.
    
    One example calltrace when it could happen that we read an old interation, got
    interrupted before even setting the dirty bit and flushing data:
    
        __schedule+1742
        __cond_resched+52
        __get_user_pages+530
        get_user_pages_unlocked+197
        hva_to_pfn+206
        try_async_pf+132
        direct_page_fault+320
        kvm_mmu_page_fault+103
        vmx_handle_exit+288
        vcpu_enter_guest+2460
        kvm_arch_vcpu_ioctl_run+325
        kvm_vcpu_ioctl+526
        __x64_sys_ioctl+131
        do_syscall_64+51
        entry_SYSCALL_64_after_hwframe+68
    
    It means iteration number cached in vcpu register can be very old when dirty
    bit set and data flushed.
    
    So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
    at the GUEST_SYNC() point of guest code when we do verification of the dirty
    bits as what this patch does.
    
    [1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
    [2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
    
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Andrew Jones <drjones@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Message-Id: <20210417143602.215059-2-peterx@redhat.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    016ff1a4
dirty_log_test.c 25.8 KB