• Dave Chinner's avatar
    xfs: shutdown in intent recovery has non-intent items in the AIL · ab9c81ef
    Dave Chinner authored
    generic/388 triggered a failure in RUI recovery due to a corrupted
    btree record and the system then locked up hard due to a subsequent
    assert failure while holding a spinlock cancelling intents:
    
     XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964).  Shutting down filesystem.
     XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
     XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632
     Call Trace:
      <TASK>
      xlog_recover_cancel_intents.isra.0+0xd1/0x120
      xlog_recover_finish+0xb9/0x110
      xfs_log_mount_finish+0x15a/0x1e0
      xfs_mountfs+0x540/0x910
      xfs_fs_fill_super+0x476/0x830
      get_tree_bdev+0x171/0x270
      ? xfs_init_fs_context+0x1e0/0x1e0
      xfs_fs_get_tree+0x15/0x20
      vfs_get_tree+0x24/0xc0
      path_mount+0x304/0xba0
      ? putname+0x55/0x60
      __x64_sys_mount+0x108/0x140
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Essentially, there's dirty metadata in the AIL from intent recovery
    transactions, so when we go to cancel the remaining intents we assume
    that all objects after the first non-intent log item in the AIL are
    not intents.
    
    This is not true. Intent recovery can log new intents to continue
    the operations the original intent could not complete in a single
    transaction. The new intents are committed before they are deferred,
    which means if the CIL commits in the background they will get
    inserted into the AIL at the head.
    
    Hence if we shut down the filesystem while processing intent
    recovery, the AIL may have new intents active at the current head.
    Hence this check:
    
                    /*
                     * We're done when we see something other than an intent.
                     * There should be no intents left in the AIL now.
                     */
                    if (!xlog_item_is_intent(lip)) {
    #ifdef DEBUG
                            for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
                                    ASSERT(!xlog_item_is_intent(lip));
    #endif
                            break;
                    }
    
    in both xlog_recover_process_intents() and
    log_recover_cancel_intents() is simply not valid. It was valid back
    when we only had EFI/EFD intents and didn't chain intents, but it
    hasn't been valid ever since intent recovery could create and commit
    new intents.
    
    Given that crashing the mount task like this pretty much prevents
    diagnosing what went wrong that lead to the initial failure that
    triggered intent cancellation, just remove the checks altogether.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    ab9c81ef
xfs_log_recover.c 98.7 KB