• zhenwei pi's avatar
    virtio-crypto: wait ctrl queue instead of busy polling · 977231e8
    zhenwei pi authored
    Originally, after submitting request into virtio crypto control
    queue, the guest side polls the result from the virt queue. This
    works like following:
        CPU0   CPU1               ...             CPUx  CPUy
         |      |                                  |     |
         \      \                                  /     /
          \--------spin_lock(&vcrypto->ctrl_lock)-------/
                               |
                     virtqueue add & kick
                               |
                      busy poll virtqueue
                               |
                  spin_unlock(&vcrypto->ctrl_lock)
                              ...
    
    There are two problems:
    1, The queue depth is always 1, the performance of a virtio crypto
       device gets limited. Multi user processes share a single control
       queue, and hit spin lock race from control queue. Test on Intel
       Platinum 8260, a single worker gets ~35K/s create/close session
       operations, and 8 workers get ~40K/s operations with 800% CPU
       utilization.
    2, The control request is supposed to get handled immediately, but
       in the current implementation of QEMU(v6.2), the vCPU thread kicks
       another thread to do this work, the latency also gets unstable.
       Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
            usecs               : count     distribution
             0 -> 1          : 0        |                        |
             2 -> 3          : 7        |                        |
             4 -> 7          : 72       |                        |
             8 -> 15         : 186485   |************************|
            16 -> 31         : 687      |                        |
            32 -> 63         : 5        |                        |
            64 -> 127        : 3        |                        |
           128 -> 255        : 1        |                        |
           256 -> 511        : 0        |                        |
           512 -> 1023       : 0        |                        |
          1024 -> 2047       : 0        |                        |
          2048 -> 4095       : 0        |                        |
          4096 -> 8191       : 0        |                        |
          8192 -> 16383      : 2        |                        |
    This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.
    
    To improve the performance of control queue, a request on control queue
    waits completion instead of busy polling to reduce lock racing, and gets
    completed by control queue callback.
        CPU0   CPU1               ...             CPUx  CPUy
         |      |                                  |     |
         \      \                                  /     /
          \--------spin_lock(&vcrypto->ctrl_lock)-------/
                               |
                     virtqueue add & kick
                               |
          ---------spin_unlock(&vcrypto->ctrl_lock)------
         /      /                                  \     \
         |      |                                  |     |
        wait   wait                               wait  wait
    
    Test this patch, the guest side get ~200K/s operations with 300% CPU
    utilization.
    
    Cc: Michael S. Tsirkin <mst@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Cc: Gonglei <arei.gonglei@huawei.com>
    Reviewed-by: default avatarGonglei <arei.gonglei@huawei.com>
    Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
    Message-Id: <20220506131627.180784-4-pizhenwei@bytedance.com>
    Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
    977231e8
virtio_crypto_common.h 4.11 KB