• Dave Chinner's avatar
    xfs: punching delalloc extents on write failure is racy · 198dd8ae
    Dave Chinner authored
    xfs_buffered_write_iomap_end() has a comment about the safety of
    punching delalloc extents based holding the IOLOCK_EXCL. This
    comment is wrong, and punching delalloc extents is not race free.
    
    When we punch out a delalloc extent after a write failure in
    xfs_buffered_write_iomap_end(), we punch out the page cache with
    truncate_pagecache_range() before we punch out the delalloc extents.
    At this point, we only hold the IOLOCK_EXCL, so there is nothing
    stopping mmap() write faults racing with this cleanup operation,
    reinstantiating a folio over the range we are about to punch and
    hence requiring the delalloc extent to be kept.
    
    If this race condition is hit, we can end up with a dirty page in
    the page cache that has no delalloc extent or space reservation
    backing it. This leads to bad things happening at writeback time.
    
    To avoid this race condition, we need the page cache truncation to
    be atomic w.r.t. the extent manipulation. We can do this by holding
    the mapping->invalidate_lock exclusively across this operation -
    this will prevent new pages from being inserted into the page cache
    whilst we are removing the pages and the backing extent and space
    reservation.
    
    Taking the mapping->invalidate_lock exclusively in the buffered
    write IO path is safe - it naturally nests inside the IOLOCK (see
    truncate and fallocate paths). iomap_zero_range() can be called from
    under the mapping->invalidate_lock (from the truncate path via
    either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter()
    will not instantiate new delalloc pages (because it skips holes) and
    hence will not ever need to punch out delalloc extents on failure.
    
    Fix the locking issue, and clean up the code logic a little to avoid
    unnecessary work if we didn't allocate the delalloc extent or wrote
    the entire region we allocated.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    198dd8ae
xfs_iomap.c 37.6 KB