• Dave Chinner's avatar
    xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly · 0d085a52
    Dave Chinner authored
    XFS has been having trouble with stray delayed allocation extents
    beyond EOF for a long time. Recent changes to the collapse range
    code has triggered erroneous EBUSY errors on page invalidtion for
    block size smaller than page size filesystems. These
    have been caused by dirty buffers beyond EOF on a partial page which
    do not get written to disk during a sync.
    
    The issue is that write-ahead in xfs_cluster_write() finds such a
    partial page and handles it by leaving the page dirty but pushing it
    into a writeback state. This used to work just fine, as the
    write_cache_pages() code would then find the dirty partial page in
    the next mapping tree lookup as the dirty tag is still set.
    
    Unfortunately, when we moved to a mark and sweep approach to
    writeback to fix other writeback sync issues, we broken this. THe
    act of marking the page as under writeback now clears the TOWRITE
    tag in the radix tree, even though the page is still dirty. This
    causes the TOWRITE tag to be cleared, and hence the next lookup on
    the mapping tree does not find the dirty partial page and so doesn't
    try to write it again.
    
    This same writeback bug was found recently in ext4 and fixed in
    commit 1c8349a1 ("ext4: fix data integrity sync in ordered mode")
    without communication to the wider filesystem community. We can use
    exactly the same fix here so the TOWRITE flag is not cleared on
    partial page writes.
    
    cc: stable@vger.kernel.org # dependent on 1c8349a1Root-cause-found-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    
    0d085a52
xfs_aops.c 48.8 KB