1. 19 Oct, 2021 36 commits
  2. 18 Oct, 2021 4 commits
    • Jens Axboe's avatar
      nvme: wire up completion batching for the IRQ path · 4f502245
      Jens Axboe authored
      Trivial to do now, just need our own io_comp_batch on the stack and pass
      that in to the usual command completion handling.
      
      I pondered making this dependent on how many entries we had to process,
      but even for a single entry there's no discernable difference in
      performance or latency. Running a sync workload over io_uring:
      
      t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
      
      yields the below performance before the patch:
      
      IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      and the following after:
      
      IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      which definitely isn't slower, about the same if you factor in a bit of
      variance. For peak performance workloads, benchmarking shows a 2%
      improvement.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4f502245
    • Jens Axboe's avatar
      io_uring: utilize the io batching infrastructure for more efficient polled IO · b688f11e
      Jens Axboe authored
      Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
      supports it, we can handle high rates of polled IO more efficiently.
      
      This raises the single core efficiency on my system from ~6.1M IOPS to
      ~6.6M IOPS running a random read workload at depth 128 on two gen2
      Optane drives.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b688f11e
    • Jens Axboe's avatar
      nvme: add support for batched completion of polled IO · c234a653
      Jens Axboe authored
      Take advantage of struct io_comp_batch, if passed in to the nvme poll
      handler. If it's set, rather than complete each request individually
      inline, store them in the io_comp_batch list. We only do so for requests
      that will complete successfully, anything else will be completed inline as
      before.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c234a653
    • Jens Axboe's avatar
      block: add support for blk_mq_end_request_batch() · f794f335
      Jens Axboe authored
      Instead of calling blk_mq_end_request() on a single request, add a helper
      that takes the new struct io_comp_batch and completes any request stored
      in there.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f794f335