1. 07 Oct, 2020 4 commits
    • Darrick J. Wong's avatar
      xfs: xfs_defer_capture should absorb remaining block reservations · 4f9a60c4
      Darrick J. Wong authored
      When xfs_defer_capture extracts the deferred ops and transaction state
      from a transaction, it should record the remaining block reservations so
      that when we continue the dfops chain, we can reserve the same number of
      blocks to use.  We capture the reservations for both data and realtime
      volumes.
      
      This adds the requirement that every log intent item recovery function
      must be careful to reserve enough blocks to handle both itself and all
      defer ops that it can queue.  On the other hand, this enables us to do
      away with the handwaving block estimation nonsense that was going on in
      xlog_finish_defer_ops.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      4f9a60c4
    • Darrick J. Wong's avatar
      xfs: proper replay of deferred ops queued during log recovery · e6fff81e
      Darrick J. Wong authored
      When we replay unfinished intent items that have been recovered from the
      log, it's possible that the replay will cause the creation of more
      deferred work items.  As outlined in commit 50995582 ("xfs: log
      recovery should replay deferred ops in order"), later work items have an
      implicit ordering dependency on earlier work items.  Therefore, recovery
      must replay the items (both recovered and created) in the same order
      that they would have been during normal operation.
      
      For log recovery, we enforce this ordering by using an empty transaction
      to collect deferred ops that get created in the process of recovering a
      log intent item to prevent them from being committed before the rest of
      the recovered intent items.  After we finish committing all the
      recovered log items, we allocate a transaction with an enormous block
      reservation, splice our huge list of created deferred ops into that
      transaction, and commit it, thereby finishing all those ops.
      
      This is /really/ hokey -- it's the one place in XFS where we allow
      nested transactions; the splicing of the defer ops list is is inelegant
      and has to be done twice per recovery function; and the broken way we
      handle inode pointers and block reservations cause subtle use-after-free
      and allocator problems that will be fixed by this patch and the two
      patches after it.
      
      Therefore, replace the hokey empty transaction with a structure designed
      to capture each chain of deferred ops that are created as part of
      recovering a single unfinished log intent.  Finally, refactor the loop
      that replays those chains to do so using one transaction per chain.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      e6fff81e
    • Darrick J. Wong's avatar
      xfs: remove XFS_LI_RECOVERED · 901219bb
      Darrick J. Wong authored
      The ->iop_recover method of a log intent item removes the recovered
      intent item from the AIL by logging an intent done item and committing
      the transaction, so it's superfluous to have this flag check.  Nothing
      else uses it, so get rid of the flag entirely.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      901219bb
    • Darrick J. Wong's avatar
      xfs: remove xfs_defer_reset · b80b29d6
      Darrick J. Wong authored
      Remove this one-line helper since the assert is trivially true in one
      call site and the rest obscures a bitmask operation.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      b80b29d6
  2. 30 Sep, 2020 1 commit
  3. 25 Sep, 2020 13 commits
  4. 23 Sep, 2020 8 commits
    • Gao Xiang's avatar
      xfs: clean up calculation of LR header blocks · 0c771b99
      Gao Xiang authored
      Let's use DIV_ROUND_UP() to calculate log record header
      blocks as what did in xlog_get_iclog_buffer_size() and
      wrap up a common helper for log recovery.
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0c771b99
    • Gao Xiang's avatar
      xfs: avoid LR buffer overrun due to crafted h_len · f692d09e
      Gao Xiang authored
      Currently, crafted h_len has been blocked for the log
      header of the tail block in commit a70f9fe5 ("xfs:
      detect and handle invalid iclog size set by mkfs").
      
      However, each log record could still have crafted h_len
      and cause log record buffer overrun. So let's check
      h_len vs buffer size for each log record as well.
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      f692d09e
    • Darrick J. Wong's avatar
      xfs: don't release log intent items when recovery fails · 384ff09b
      Darrick J. Wong authored
      Nowadays, log recovery will call ->release on the recovered intent items
      if recovery fails.  Therefore, it's redundant to release them from
      inside the ->recover functions when they're about to return an error.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      384ff09b
    • Darrick J. Wong's avatar
      xfs: attach inode to dquot in xfs_bui_item_recover · 2dbf872c
      Darrick J. Wong authored
      In the bmap intent item recovery code, we must be careful to attach the
      inode to its dquots (if quotas are enabled) so that a change in the
      shape of the bmap btree doesn't cause the quota counters to be
      incorrect.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      2dbf872c
    • Darrick J. Wong's avatar
      xfs: log new intent items created as part of finishing recovered intent items · 93293bcb
      Darrick J. Wong authored
      During a code inspection, I found a serious bug in the log intent item
      recovery code when an intent item cannot complete all the work and
      decides to requeue itself to get that done.  When this happens, the
      item recovery creates a new incore deferred op representing the
      remaining work and attaches it to the transaction that it allocated.  At
      the end of _item_recover, it moves the entire chain of deferred ops to
      the dummy parent_tp that xlog_recover_process_intents passed to it, but
      fail to log a new intent item for the remaining work before committing
      the transaction for the single unit of work.
      
      xlog_finish_defer_ops logs those new intent items once recovery has
      finished dealing with the intent items that it recovered, but this isn't
      sufficient.  If the log is forced to disk after a recovered log item
      decides to requeue itself and the system goes down before we call
      xlog_finish_defer_ops, the second log recovery will never see the new
      intent item and therefore has no idea that there was more work to do.
      It will finish recovery leaving the filesystem in a corrupted state.
      
      The same logic applies to /any/ deferred ops added during intent item
      recovery, not just the one handling the remaining work.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      93293bcb
    • Darrick J. Wong's avatar
      xfs: check dabtree node hash values when loading child blocks · e581c939
      Darrick J. Wong authored
      When xchk_da_btree_block is loading a non-root dabtree block, we know
      that the parent block had to have a (hashval, address) pointer to the
      block that we just loaded.  Check that the hashval in the parent matches
      the block we just loaded.
      
      This was found by fuzzing nbtree[3].hashval = ones in xfs/394.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      e581c939
    • Darrick J. Wong's avatar
      xfs: don't free rt blocks when we're doing a REMAP bunmapi call · 8df0fa39
      Darrick J. Wong authored
      When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
      to be unmapped from the given file fork without the extent being freed.
      We do this for non-rt files, but we forgot to do this for realtime
      files.  So far this isn't a big deal since nobody makes a bunmapi call
      to a rt file with the REMAP flag set, but don't leave a logic bomb.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      8df0fa39
    • Chandan Babu R's avatar
      xfs: Set xfs_buf's b_ops member when zeroing bitmap/summary files · c54e14d1
      Chandan Babu R authored
      In xfs_growfs_rt(), we enlarge bitmap and summary files by allocating
      new blocks for both files. For each of the new blocks allocated, we
      allocate an xfs_buf, zero the payload, log the contents and commit the
      transaction. Hence these buffers will eventually find themselves
      appended to list at xfs_ail->ail_buf_list.
      
      Later, xfs_growfs_rt() loops across all of the new blocks belonging to
      the bitmap inode to set the bitmap values to 1. In doing so, it
      allocates a new transaction and invokes the following sequence of
      functions,
        - xfs_rtfree_range()
          - xfs_rtmodify_range()
            - xfs_rtbuf_get()
              We pass '&xfs_rtbuf_ops' as the ops pointer to xfs_trans_read_buf().
              - xfs_trans_read_buf()
      	  We find the xfs_buf of interest in per-ag hash table, invoke
      	  xfs_buf_reverify() which ends up assigning '&xfs_rtbuf_ops' to
      	  xfs_buf->b_ops.
      
      On the other hand, if xfs_growfs_rt_alloc() had allocated a few blocks
      for the bitmap inode and returned with an error, all the xfs_bufs
      corresponding to the new bitmap blocks that have been allocated would
      continue to be on xfs_ail->ail_buf_list list without ever having a
      non-NULL value assigned to their b_ops members. An AIL flush operation
      would then trigger the following warning message to be printed on the
      console,
      
        XFS (loop0): _xfs_buf_ioapply: no buf ops on daddr 0x58 len 8
        00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        CPU: 3 PID: 449 Comm: xfsaild/loop0 Not tainted 5.8.0-rc4-chandan-00038-g4d8c2b9de9ab-dirty #37
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0x57/0x70
         _xfs_buf_ioapply+0x37c/0x3b0
         ? xfs_rw_bdev+0x1e0/0x1e0
         ? xfs_buf_delwri_submit_buffers+0xd4/0x210
         __xfs_buf_submit+0x6d/0x1f0
         xfs_buf_delwri_submit_buffers+0xd4/0x210
         xfsaild+0x2c8/0x9e0
         ? __switch_to_asm+0x42/0x70
         ? xfs_trans_ail_cursor_first+0x80/0x80
         kthread+0xfe/0x140
         ? kthread_park+0x90/0x90
         ret_from_fork+0x22/0x30
      
      This message indicates that the xfs_buf had its b_ops member set to
      NULL.
      
      This commit fixes the issue by assigning "&xfs_rtbuf_ops" to b_ops
      member of each of the xfs_bufs logged by xfs_growfs_rt_alloc().
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      c54e14d1
  5. 21 Sep, 2020 2 commits
    • Chandan Babu R's avatar
      xfs: Set xfs_buf type flag when growing summary/bitmap files · 72cc9513
      Chandan Babu R authored
      The following sequence of commands,
      
        mkfs.xfs -f -m reflink=0 -r rtdev=/dev/loop1,size=10M /dev/loop0
        mount -o rtdev=/dev/loop1 /dev/loop0 /mnt
        xfs_growfs  /mnt
      
      ... causes the following call trace to be printed on the console,
      
      XFS: Assertion failed: (bip->bli_flags & XFS_BLI_STALE) || (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF), file: fs/xfs/xfs_buf_item.c, line: 331
      Call Trace:
       xfs_buf_item_format+0x632/0x680
       ? kmem_alloc_large+0x29/0x90
       ? kmem_alloc+0x70/0x120
       ? xfs_log_commit_cil+0x132/0x940
       xfs_log_commit_cil+0x26f/0x940
       ? xfs_buf_item_init+0x1ad/0x240
       ? xfs_growfs_rt_alloc+0x1fc/0x280
       __xfs_trans_commit+0xac/0x370
       xfs_growfs_rt_alloc+0x1fc/0x280
       xfs_growfs_rt+0x1a0/0x5e0
       xfs_file_ioctl+0x3fd/0xc70
       ? selinux_file_ioctl+0x174/0x220
       ksys_ioctl+0x87/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x3e/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurs because the buffer being formatted has the value of
      XFS_BLFT_UNKNOWN_BUF assigned to the 'type' subfield of
      bip->bli_formats->blf_flags.
      
      This commit fixes the issue by assigning one of XFS_BLFT_RTSUMMARY_BUF
      and XFS_BLFT_RTBITMAP_BUF to the 'type' subfield of
      bip->bli_formats->blf_flags before committing the corresponding
      transaction.
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      72cc9513
    • Brian Foster's avatar
      xfs: drop extra transaction roll from inode extent truncate · 6dd379c7
      Brian Foster authored
      The inode extent truncate path unmaps extents from the inode block
      mapping, finishes deferred ops to free the associated extents and
      then explicitly rolls the transaction before processing the next
      extent. The latter extent roll is spurious as xfs_defer_finish()
      always returns a clean transaction and automatically relogs inodes
      attached to the transaction (with lock_flags == 0). This can
      unnecessarily increase the number of log ticket regrants that occur
      during a long running truncate operation. Remove the explicit
      transaction roll.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      6dd379c7
  6. 16 Sep, 2020 12 commits