• Brian Foster's avatar
    ext4: allow concurrent unaligned dio overwrites · 310ee090
    Brian Foster authored
    We've had reports of significant performance regression of sub-block
    (unaligned) direct writes due to the added exclusivity restrictions
    in ext4. The purpose of the exclusivity requirement for unaligned
    direct writes is to avoid data corruption caused by unserialized
    partial block zeroing in the iomap dio layer across overlapping
    writes.
    
    XFS has similar requirements for the same underlying reasons, yet
    doesn't suffer the extreme performance regression that ext4 does.
    The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY
    mode, which allows for optimistic submission of concurrent unaligned
    I/O and kicks back writes that require partial block zeroing such
    that they can be submitted in a safe, exclusive context. Since ext4
    already performs most of these checks pre-submission, it can support
    something similar without necessarily relying on the iomap flag and
    associated retry mechanism.
    
    Update the dio write submission path to allow concurrent submission
    of unaligned direct writes that are purely overwrite and so will not
    require block zeroing. To improve readability of the various related
    checks, move the unaligned I/O handling down into
    ext4_dio_write_checks(), where the dio draining and force wait logic
    can immediately follow the locking requirement checks. Finally, the
    IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a
    precaution should the ext4 overwrite logic ever become inconsistent
    with the zeroing expectations of iomap dio.
    
    The performance improvement of sub-block direct write I/O is shown
    in the following fio test on a 64xcpu guest vm:
    
    Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting
    --overwrite=1 --thread --size=10G --filename=/mnt/fio
    --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8
    --blocksize=2k --iodepth=256 --allow_file_create=0
    
    v6.2:		write: IOPS=4328, BW=8724KiB/s
    v6.2 (patched):	write: IOPS=801k, BW=1565MiB/s
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    310ee090
file.c 25.8 KB