• Jens Axboe's avatar
    io_uring: use task_work for links if possible · c40f6379
    Jens Axboe authored
    Currently links are always done in an async fashion, unless we catch them
    inline after we successfully complete a request without having to resort
    to blocking. This isn't necessarily the most efficient approach, it'd be
    more ideal if we could just use the task_work handling for this.
    
    Outside of saving an async jump, we can also do less prep work for these
    kinds of requests.
    
    Running dependent links from the task_work handler yields some nice
    performance benefits. As an example, examples/link-cp from the liburing
    repository uses read+write links to implement a copy operation. Without
    this patch, the a cache fold 4G file read from a VM runs in about 3
    seconds:
    
    $ time examples/link-cp /data/file /dev/null
    
    real	0m2.986s
    user	0m0.051s
    sys	0m2.843s
    
    and a subsequent cache hot run looks like this:
    
    $ time examples/link-cp /data/file /dev/null
    
    real	0m0.898s
    user	0m0.069s
    sys	0m0.797s
    
    With this patch in place, the cold case takes about 2.4 seconds:
    
    $ time examples/link-cp /data/file /dev/null
    
    real	0m2.400s
    user	0m0.020s
    sys	0m2.366s
    
    and the cache hot case looks like this:
    
    $ time examples/link-cp /data/file /dev/null
    
    real	0m0.676s
    user	0m0.010s
    sys	0m0.665s
    
    As expected, the (mostly) cache hot case yields the biggest improvement,
    running about 25% faster with this change, while the cache cold case
    yields about a 20% increase in performance. Outside of the performance
    increase, we're using less CPU as well, as we're not using the async
    offload threads at all for this anymore.
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c40f6379
io_uring.c 203 KB