xfs: don't run shutdown callbacks on active iclogs

When the log is shutdown, it currently walks all the iclogs and runs callbacks that are attached to the iclogs, regardless of whether the iclog is queued for IO completion or not. This creates a problem for contexts attaching callbacks to iclogs in that a racing shutdown can run the callbacks even before the attaching context has finished processing the iclog and releasing it for IO submission. If the callback processing of the iclog frees the structure that is attached to the iclog, then this leads to an UAF scenario that can only be protected against by holding the icloglock from the point callbacks are attached through to the release of the iclog. While we currently do this, it is not practical or sustainable. Hence we need to make shutdown processing the responsibility of the context that holds active references to the iclog. We know that the contexts attaching callbacks to the iclog must have active references to the iclog, and that means they must be in either ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over iclogs in these states -except- when the log is shut down. xlog_state_do_callback() checks the state of the iclogs while holding the icloglock, therefore the reference count/state change that occurs in xlog_state_release_iclog() after the callbacks are atomic w.r.t. shutdown processing. We can't push the responsibility of callback cleanup onto the CIL context because we can have ACTIVE iclogs that have callbacks attached that have already been released. Hence we really need to internalise the cleanup of callbacks into xlog_state_release_iclog() processing. Indeed, we already have that internalisation via: xlog_state_release_iclog drop last reference ->SYNCING xlog_sync xlog_write_iclog if (log_is_shutdown) xlog_state_done_syncing() xlog_state_do_callback() <process shutdown on iclog that is now in SYNCING state> The problem is that xlog_state_release_iclog() aborts before doing anything if the log is already shut down. It assumes that the callbacks have already been cleaned up, and it doesn't need to do any cleanup. Hence the fix is to remove the xlog_is_shutdown() check from xlog_state_release_iclog() so that reference counts are correctly released from the iclogs, and when the reference count is zero we always transition to SYNCING if the log is shut down. Hence we'll always enter the xlog_sync() path in a shutdown and eventually end up erroring out the iclog IO and running xlog_state_do_callback() to process the callbacks attached to the iclog. This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs directly in the shutdown code, and in doing so gets rid of the UAF vector that currently exists. This then decouples the adding of callbacks to the iclogs from xlog_state_release_iclog() as we guarantee that xlog_state_release_iclog() will process the callbacks if the log has been shut down before xlog_state_release_iclog() has been called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfs: don't run shutdown callbacks on active iclogs
When the log is shutdown, it currently walks all the iclogs and runs callbacks that are attached to the iclogs, regardless of whether the iclog is queued for IO completion or not. This creates a problem for contexts attaching callbacks to iclogs in that a racing shutdown can run the callbacks even before the attaching context has finished processing the iclog and releasing it for IO submission. If the callback processing of the iclog frees the structure that is attached to the iclog, then this leads to an UAF scenario that can only be protected against by holding the icloglock from the point callbacks are attached through to the release of the iclog. While we currently do this, it is not practical or sustainable. Hence we need to make shutdown processing the responsibility of the context that holds active references to the iclog. We know that the contexts attaching callbacks to the iclog must have active references to the iclog, and that means they must be in either ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over iclogs in these states -except- when the log is shut down. xlog_state_do_callback() checks the state of the iclogs while holding the icloglock, therefore the reference count/state change that occurs in xlog_state_release_iclog() after the callbacks are atomic w.r.t. shutdown processing. We can't push the responsibility of callback cleanup onto the CIL context because we can have ACTIVE iclogs that have callbacks attached that have already been released. Hence we really need to internalise the cleanup of callbacks into xlog_state_release_iclog() processing. Indeed, we already have that internalisation via: xlog_state_release_iclog drop last reference ->SYNCING xlog_sync xlog_write_iclog if (log_is_shutdown) xlog_state_done_syncing() xlog_state_do_callback() <process shutdown on iclog that is now in SYNCING state> The problem is that xlog_state_release_iclog() aborts before doing anything if the log is already shut down. It assumes that the callbacks have already been cleaned up, and it doesn't need to do any cleanup. Hence the fix is to remove the xlog_is_shutdown() check from xlog_state_release_iclog() so that reference counts are correctly released from the iclogs, and when the reference count is zero we always transition to SYNCING if the log is shut down. Hence we'll always enter the xlog_sync() path in a shutdown and eventually end up erroring out the iclog IO and running xlog_state_do_callback() to process the callbacks attached to the iclog. This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs directly in the shutdown code, and in doing so gets rid of the UAF vector that currently exists. This then decouples the adding of callbacks to the iclogs from xlog_state_release_iclog() as we guarantee that xlog_state_release_iclog() will process the callbacks if the log has been shut down before xlog_state_release_iclog() has been called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
502a01fa · Dave Chinner · Darrick J. Wong · aad7272a · 502a01fa · 502a01fa
Commit 502a01fa authored Aug 10, 2021 by Dave Chinner Committed by Darrick J. Wong Aug 16, 2021
Show whitespace changes
Inline Side-by-side

Showing with 38 additions and 12 deletions

fs/xfs/xfs_log.c fs/xfs/xfs_log.c +31 -4

fs/xfs/xfs_log_cil.c fs/xfs/xfs_log_cil.c +7 -8

No files found.
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -41,6 +41,8 @@ xlog_dealloc_log(
 /* local state machine functions */
 STATIC void xlog_state_done_syncing(
 	struct xlog_in_core	*iclog);
+STATIC void xlog_state_do_callback(
+	struct xlog		*log);
 STATIC int
 xlog_state_get_iclog_space(
 	struct xlog		*log,
@@ -491,6 +493,11 @@ xfs_log_reserve(
 * space waiters so they can process the newly set shutdown state. We really
 * don't care what order we process callbacks here because the log is shut down
 * and so state cannot change on disk anymore.
+ *
+ * We avoid processing actively referenced iclogs so that we don't run callbacks
+ * while the iclog owner might still be preparing the iclog for IO submssion.
+ * These will be caught by xlog_state_iclog_release() and call this function
+ * again to process any callbacks that may have been added to that iclog.
 */
 static void
 xlog_state_shutdown_callbacks(
@@ -502,7 +509,12 @@ xlog_state_shutdown_callbacks(
 	spin_lock(&log->l_icloglock);
 	iclog = log->l_iclog;
 	do {
+		if (atomic_read(&iclog->ic_refcnt)) {
+			/* Reference holder will re-run iclog callbacks. */
+			continue;
+		}
 		list_splice_init(&iclog->ic_callbacks, &cb_list);
+		wake_up_all(&iclog->ic_write_wait);
 		wake_up_all(&iclog->ic_force_wait);
 	} while ((iclog = iclog->ic_next) != log->l_iclog);
@@ -546,12 +558,11 @@ xlog_state_release_iclog(
 	xfs_lsn_t		old_tail_lsn)
 {
 	xfs_lsn_t		tail_lsn;
+	bool			last_ref;
 	lockdep_assert_held(&log->l_icloglock);
 	trace_xlog_iclog_release(iclog, _RET_IP_);
-	if (xlog_is_shutdown(log))
-		return -EIO;
 	/*
 	 * Grabbing the current log tail needs to be atomic w.r.t. the writing
 	 * of the tail LSN into the iclog so we guarantee that the log tail does
@@ -569,7 +580,23 @@ xlog_state_release_iclog(
 			iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
 	}
-	if (!atomic_dec_and_test(&iclog->ic_refcnt))
+	last_ref = atomic_dec_and_test(&iclog->ic_refcnt);
+	if (xlog_is_shutdown(log)) {
+		/*
+		 * If there are no more references to this iclog, process the
+		 * pending iclog callbacks that were waiting on the release of
+		 * this iclog.
+		 */
+		if (last_ref) {
+			spin_unlock(&log->l_icloglock);
+			xlog_state_shutdown_callbacks(log);
+			spin_lock(&log->l_icloglock);
+		}
+		return -EIO;
+	}
+	if (!last_ref)
 		return 0;
 	if (iclog->ic_state != XLOG_STATE_WANT_SYNC) {

--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -891,11 +891,10 @@ xlog_cil_push_work(
 	xfs_log_ticket_ungrant(log, tic);
 	/*
-	 * Once we attach the ctx to the iclog, a shutdown can process the
+	 * Once we attach the ctx to the iclog, it is effectively owned by the
-	 * iclog, run the callbacks and free the ctx. The only thing preventing
+	 * iclog and we can only use it while we still have an active reference
-	 * this potential UAF situation here is that we are holding the
+	 * to the iclog. i.e. once we call xlog_state_release_iclog() we can no
-	 * icloglock. Hence we cannot access the ctx once we have attached the
+	 * longer safely reference the ctx.
-	 * callbacks and dropped the icloglock.
 	 */
 	spin_lock(&log->l_icloglock);
 	if (xlog_is_shutdown(log)) {
@@ -927,9 +926,6 @@ xlog_cil_push_work(
 	 * wakeup until this commit_iclog is written to disk.  Hence we use the
 	 * iclog header lsn and compare it to the commit lsn to determine if we
 	 * need to wait on iclogs or not.
-	 *
-	 * NOTE: It is not safe to reference the ctx after this check as we drop
-	 * the icloglock if we have to wait for completion of other iclogs.
 	 */
 	if (ctx->start_lsn != commit_lsn) {
 		xfs_lsn_t	plsn;
@@ -959,6 +955,9 @@ xlog_cil_push_work(
 	 */
 	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
 	xlog_state_release_iclog(log, commit_iclog, preflush_tail_lsn);
+	/* Not safe to reference ctx now! */
 	spin_unlock(&log->l_icloglock);
 	return;