• Waiman Long's avatar
    x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 · dd0fd8bc
    Waiman Long authored
    It was found when running fio sequential write test with a XFS ramdisk
    on a KVM guest running on a 2-socket x86-64 system, the %CPU times
    as reported by perf were as follows:
    
     69.75%  0.59%  fio  [k] down_write
     69.15%  0.01%  fio  [k] call_rwsem_down_write_failed
     67.12%  1.12%  fio  [k] rwsem_down_write_failed
     63.48% 52.77%  fio  [k] osq_lock
      9.46%  7.88%  fio  [k] __raw_callee_save___kvm_vcpu_is_preempt
      3.93%  3.93%  fio  [k] __kvm_vcpu_is_preempted
    
    Making vcpu_is_preempted() a callee-save function has a relatively
    high cost on x86-64 primarily due to at least one more cacheline of
    data access from the saving and restoring of registers (8 of them)
    to and from stack as well as one more level of function call.
    
    To reduce this performance overhead, an optimized assembly version
    of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
    provided for x86-64.
    
    With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
    system with 16 parallel jobs (8 on each socket), the aggregrate
    bandwidth of the fio test on an XFS ramdisk were as follows:
    
       I/O Type      w/o patch    with patch
       --------      ---------    ----------
       random read   8141.2 MB/s  8497.1 MB/s
       seq read      8229.4 MB/s  8304.2 MB/s
       random write  1675.5 MB/s  1701.5 MB/s
       seq write     1681.3 MB/s  1699.9 MB/s
    
    There are some increases in the aggregated bandwidth because of
    the patch.
    
    The perf data now became:
    
     70.78%  0.58%  fio  [k] down_write
     70.20%  0.01%  fio  [k] call_rwsem_down_write_failed
     69.70%  1.17%  fio  [k] rwsem_down_write_failed
     59.91% 55.42%  fio  [k] osq_lock
     10.14% 10.14%  fio  [k] __kvm_vcpu_is_preempted
    
    The assembly code was verified by using a test kernel module to
    compare the output of C __kvm_vcpu_is_preempted() and that of assembly
    __raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
    Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    dd0fd8bc
asm-offsets_64.c 1.72 KB