• Ming Lei's avatar
    block: loop: support DIO & AIO · bc07c10a
    Ming Lei authored
    There are at least 3 advantages to use direct I/O and AIO on
    read/write loop's backing file:
    
    1) double cache can be avoided, then memory usage gets
    decreased a lot
    
    2) not like user space direct I/O, there isn't cost of
    pinning pages
    
    3) avoid context switch for obtaining good throughput
    - in buffered file read, random I/O top throughput is often obtained
    only if they are submitted concurrently from lots of tasks; but for
    sequential I/O, most of times they can be hit from page cache, so
    concurrent submissions often introduce unnecessary context switch
    and can't improve throughput much. There was such discussion[1]
    to use non-blocking I/O to improve the problem for application.
    - with direct I/O and AIO, concurrent submissions can be
    avoided and random read throughput can't be affected meantime
    
    xfstests(-g auto, ext4) is basically passed when running with
    direct I/O(aio), one exception is generic/232, but it failed in
    loop buffered I/O(4.2-rc6-next-20150814) too.
    
    Follows the fio test result for performance purpose:
    	4 jobs fio test inside ext4 file system over loop block
    
    1) How to run
    	- KVM: 4 VCPUs, 2G RAM
    	- linux kernel: 4.2-rc6-next-20150814(base) with the patchset
    	- the loop block is over one image on SSD.
    	- linux psync, 4 jobs, size 1500M, ext4 over loop block
    	- test result: IOPS from fio output
    
    2) Throughput(IOPS) becomes a bit better with direct I/O(aio)
            -------------------------------------------------------------
            test cases          |randread   |read   |randwrite  |write  |
            -------------------------------------------------------------
            base                |8015       |113811 |67442      |106978
            -------------------------------------------------------------
            base+loop aio       |8136       |125040 |67811      |111376
            -------------------------------------------------------------
    
    - somehow, it should be caused by more page cache avaiable for
    application or one extra page copy is avoided in case of direct I/O
    
    3) context switch
            - context switch decreased by ~50% with loop direct I/O(aio)
    	compared with loop buffered I/O(4.2-rc6-next-20150814)
    
    4) memory usage from /proc/meminfo
            -------------------------------------------------------------
                                       | Buffers       | Cached
            -------------------------------------------------------------
            base                       | > 760MB       | ~950MB
            -------------------------------------------------------------
            base+loop direct I/O(aio)  | < 5MB         | ~1.6GB
            -------------------------------------------------------------
    
    - so there are much more page caches available for application with
    direct I/O
    
    [1] https://lwn.net/Articles/612483/Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    bc07c10a
loop.c 49.9 KB