• Dave Chinner's avatar
    xfs: log forces imply data device cache flushes · 2bf1ec0f
    Dave Chinner authored
    After fixing the tail_lsn vs cache flush race, generic/482 continued
    to fail in a similar way where cache flushes were missing before
    iclog FUA writes. Tracing of iclog state changes during the fsstress
    workload portion of the test (via xlog_iclog* events) indicated that
    iclog writes were coming from two sources - CIL pushes and log
    forces (due to fsync/O_SYNC operations). All of the cases where a
    recovery problem was triggered indicated that the log force was the
    source of the iclog write that was not preceeded by a cache flush.
    
    This was an oversight in the modifications made in commit
    eef983ff ("xfs: journal IO cache flush reductions"). Log forces
    for fsync imply a data device cache flush has been issued if an
    iclog was flushed to disk and is indicated to the caller via the
    log_flushed parameter so they can elide the device cache flush if
    the journal issued one.
    
    The change in eef983ff results in iclogs only issuing a cache
    flush if XLOG_ICL_NEED_FLUSH is set on the iclog, but this was not
    added to the iclogs that the log force code flushes to disk. Hence
    log forces are no longer guaranteeing that a cache flush is issued,
    hence opening up a potential on-disk ordering failure.
    
    Log forces should also set XLOG_ICL_NEED_FUA as well to ensure that
    the actual iclogs it forces to the journal are also on stable
    storage before it returns to the caller.
    
    This patch introduces the xlog_force_iclog() helper function to
    encapsulate the process of taking a reference to an iclog, switching
    its state if WANT_SYNC and flushing it to stable storage correctly.
    
    Both xfs_log_force() and xfs_log_force_lsn() are converted to use
    it, as is xlog_unmount_write() which has an elaborate method of
    doing exactly the same "write this iclog to stable storage"
    operation.
    
    Further, if the log force code needs to wait on a iclog in the
    WANT_SYNC state, it needs to ensure that iclog also results in a
    cache flush being issued. This covers the case where the iclog
    contains the commit record of the CIL flush that the log force
    triggered, but it hasn't been written yet because there is still an
    active reference to the iclog.
    
    Note: this whole cache flush whack-a-mole patch is a result of log
    forces still being iclog state centric rather than being CIL
    sequence centric. Most of this nasty code will go away in future
    when log forces are converted to wait on CIL sequence push
    completion rather than iclog completion. With the CIL push algorithm
    guaranteeing that the CIL checkpoint is fully on stable storage when
    it completes, we no longer need to iterate iclogs and push them to
    ensure a CIL sequence push has completed and so all this nasty iclog
    iteration and flushing code will go away.
    
    Fixes: eef983ff ("xfs: journal IO cache flush reductions")
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    2bf1ec0f
xfs_log.c 108 KB