Commit ab9c81ef authored by Dave Chinner's avatar Dave Chinner Committed by Darrick J. Wong

xfs: shutdown in intent recovery has non-intent items in the AIL

generic/388 triggered a failure in RUI recovery due to a corrupted
btree record and the system then locked up hard due to a subsequent
assert failure while holding a spinlock cancelling intents:

 XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964).  Shutting down filesystem.
 XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
 XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632
 Call Trace:
  <TASK>
  xlog_recover_cancel_intents.isra.0+0xd1/0x120
  xlog_recover_finish+0xb9/0x110
  xfs_log_mount_finish+0x15a/0x1e0
  xfs_mountfs+0x540/0x910
  xfs_fs_fill_super+0x476/0x830
  get_tree_bdev+0x171/0x270
  ? xfs_init_fs_context+0x1e0/0x1e0
  xfs_fs_get_tree+0x15/0x20
  vfs_get_tree+0x24/0xc0
  path_mount+0x304/0xba0
  ? putname+0x55/0x60
  __x64_sys_mount+0x108/0x140
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Essentially, there's dirty metadata in the AIL from intent recovery
transactions, so when we go to cancel the remaining intents we assume
that all objects after the first non-intent log item in the AIL are
not intents.

This is not true. Intent recovery can log new intents to continue
the operations the original intent could not complete in a single
transaction. The new intents are committed before they are deferred,
which means if the CIL commits in the background they will get
inserted into the AIL at the head.

Hence if we shut down the filesystem while processing intent
recovery, the AIL may have new intents active at the current head.
Hence this check:

                /*
                 * We're done when we see something other than an intent.
                 * There should be no intents left in the AIL now.
                 */
                if (!xlog_item_is_intent(lip)) {
#ifdef DEBUG
                        for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
                                ASSERT(!xlog_item_is_intent(lip));
#endif
                        break;
                }

in both xlog_recover_process_intents() and
log_recover_cancel_intents() is simply not valid. It was valid back
when we only had EFI/EFD intents and didn't chain intents, but it
hasn't been valid ever since intent recovery could create and commit
new intents.

Given that crashing the mount task like this pretty much prevents
diagnosing what went wrong that lead to the initial failure that
triggered intent cancellation, just remove the checks altogether.
Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
parent d2d7c047
...@@ -2519,21 +2519,22 @@ xlog_abort_defer_ops( ...@@ -2519,21 +2519,22 @@ xlog_abort_defer_ops(
xfs_defer_ops_capture_free(mp, dfc); xfs_defer_ops_capture_free(mp, dfc);
} }
} }
/* /*
* When this is called, all of the log intent items which did not have * When this is called, all of the log intent items which did not have
* corresponding log done items should be in the AIL. What we do now * corresponding log done items should be in the AIL. What we do now is update
* is update the data structures associated with each one. * the data structures associated with each one.
* *
* Since we process the log intent items in normal transactions, they * Since we process the log intent items in normal transactions, they will be
* will be removed at some point after the commit. This prevents us * removed at some point after the commit. This prevents us from just walking
* from just walking down the list processing each one. We'll use a * down the list processing each one. We'll use a flag in the intent item to
* flag in the intent item to skip those that we've already processed * skip those that we've already processed and use the AIL iteration mechanism's
* and use the AIL iteration mechanism's generation count to try to * generation count to try to speed this up at least a bit.
* speed this up at least a bit.
* *
* When we start, we know that the intents are the only things in the * When we start, we know that the intents are the only things in the AIL. As we
* AIL. As we process them, however, other items are added to the * process them, however, other items are added to the AIL. Hence we know we
* AIL. * have started recovery on all the pending intents when we find an non-intent
* item in the AIL.
*/ */
STATIC int STATIC int
xlog_recover_process_intents( xlog_recover_process_intents(
...@@ -2556,17 +2557,8 @@ xlog_recover_process_intents( ...@@ -2556,17 +2557,8 @@ xlog_recover_process_intents(
for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
lip != NULL; lip != NULL;
lip = xfs_trans_ail_cursor_next(ailp, &cur)) { lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
/* if (!xlog_item_is_intent(lip))
* We're done when we see something other than an intent.
* There should be no intents left in the AIL now.
*/
if (!xlog_item_is_intent(lip)) {
#ifdef DEBUG
for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
ASSERT(!xlog_item_is_intent(lip));
#endif
break; break;
}
/* /*
* We should never see a redo item with a LSN higher than * We should never see a redo item with a LSN higher than
...@@ -2607,8 +2599,9 @@ xlog_recover_process_intents( ...@@ -2607,8 +2599,9 @@ xlog_recover_process_intents(
} }
/* /*
* A cancel occurs when the mount has failed and we're bailing out. * A cancel occurs when the mount has failed and we're bailing out. Release all
* Release all pending log intent items so they don't pin the AIL. * pending log intent items that we haven't started recovery on so they don't
* pin the AIL.
*/ */
STATIC void STATIC void
xlog_recover_cancel_intents( xlog_recover_cancel_intents(
...@@ -2622,17 +2615,8 @@ xlog_recover_cancel_intents( ...@@ -2622,17 +2615,8 @@ xlog_recover_cancel_intents(
spin_lock(&ailp->ail_lock); spin_lock(&ailp->ail_lock);
lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
while (lip != NULL) { while (lip != NULL) {
/* if (!xlog_item_is_intent(lip))
* We're done when we see something other than an intent.
* There should be no intents left in the AIL now.
*/
if (!xlog_item_is_intent(lip)) {
#ifdef DEBUG
for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
ASSERT(!xlog_item_is_intent(lip));
#endif
break; break;
}
spin_unlock(&ailp->ail_lock); spin_unlock(&ailp->ail_lock);
lip->li_ops->iop_release(lip); lip->li_ops->iop_release(lip);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment