Commit a79b28c2 authored by Dave Chinner's avatar Dave Chinner Committed by Darrick J. Wong

xfs: separate CIL commit record IO

To allow for iclog IO device cache flush behaviour to be optimised,
we first need to separate out the commit record iclog IO from the
rest of the checkpoint so we can wait for the checkpoint IO to
complete before we issue the commit record.

This separation is only necessary if the commit record is being
written into a different iclog to the start of the checkpoint as the
upcoming cache flushing changes requires completion ordering against
the other iclogs submitted by the checkpoint.

If the entire checkpoint and commit is in the one iclog, then they
are both covered by the one set of cache flush primitives on the
iclog and hence there is no need to separate them for ordering.

Otherwise, we need to wait for all the previous iclogs to complete
so they are ordered correctly and made stable by the REQ_PREFLUSH
that the commit record iclog IO issues. This guarantees that if a
reader sees the commit record in the journal, they will also see the
entire checkpoint that commit record closes off.

This also provides the guarantee that when the commit record IO
completes, we can safely unpin all the log items in the checkpoint
so they can be written back because the entire checkpoint is stable
in the journal.
Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
parent 18842e0a
...@@ -786,10 +786,12 @@ xfs_log_mount_cancel( ...@@ -786,10 +786,12 @@ xfs_log_mount_cancel(
} }
/* /*
* Wait for the iclog to be written disk, or return an error if the log has been * Wait for the iclog and all prior iclogs to be written disk as required by the
* shut down. * log force state machine. Waiting on ic_force_wait ensures iclog completions
* have been ordered and callbacks run before we are woken here, hence
* guaranteeing that all the iclogs up to this one are on stable storage.
*/ */
static int int
xlog_wait_on_iclog( xlog_wait_on_iclog(
struct xlog_in_core *iclog) struct xlog_in_core *iclog)
__releases(iclog->ic_log->l_icloglock) __releases(iclog->ic_log->l_icloglock)
......
...@@ -870,6 +870,15 @@ xlog_cil_push_work( ...@@ -870,6 +870,15 @@ xlog_cil_push_work(
wake_up_all(&cil->xc_commit_wait); wake_up_all(&cil->xc_commit_wait);
spin_unlock(&cil->xc_push_lock); spin_unlock(&cil->xc_push_lock);
/*
* If the checkpoint spans multiple iclogs, wait for all previous
* iclogs to complete before we submit the commit_iclog.
*/
if (ctx->start_lsn != commit_lsn) {
spin_lock(&log->l_icloglock);
xlog_wait_on_iclog(commit_iclog->ic_prev);
}
/* release the hounds! */ /* release the hounds! */
xfs_log_release_iclog(commit_iclog); xfs_log_release_iclog(commit_iclog);
return; return;
......
...@@ -584,6 +584,8 @@ xlog_wait( ...@@ -584,6 +584,8 @@ xlog_wait(
remove_wait_queue(wq, &wait); remove_wait_queue(wq, &wait);
} }
int xlog_wait_on_iclog(struct xlog_in_core *iclog);
/* /*
* The LSN is valid so long as it is behind the current LSN. If it isn't, this * The LSN is valid so long as it is behind the current LSN. If it isn't, this
* means that the next log record that includes this metadata could have a * means that the next log record that includes this metadata could have a
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment