• Jens Axboe's avatar
    nvme: wire up completion batching for the IRQ path · 4f502245
    Jens Axboe authored
    Trivial to do now, just need our own io_comp_batch on the stack and pass
    that in to the usual command completion handling.
    
    I pondered making this dependent on how many entries we had to process,
    but even for a single entry there's no discernable difference in
    performance or latency. Running a sync workload over io_uring:
    
    t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
    
    yields the below performance before the patch:
    
    IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    
    and the following after:
    
    IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    
    which definitely isn't slower, about the same if you factor in a bit of
    variance. For peak performance workloads, benchmarking shows a 2%
    improvement.
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    4f502245
pci.c 89.1 KB