1. 04 May, 2022 21 commits
  2. 28 Apr, 2022 10 commits
    • Darrick J. Wong's avatar
      xfs: rename xfs_*alloc*_log_count to _block_count · 6ed7e509
      Darrick J. Wong authored
      These functions return the maximum number of blocks that could be logged
      in a particular transaction.  "log count" is confusing since there's a
      separate concept of a log (operation) count in the reservation code, so
      let's change it to "block count" to be less confusing.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      6ed7e509
    • Darrick J. Wong's avatar
      xfs: rewrite xfs_reflink_end_cow to use intents · df2fd88f
      Darrick J. Wong authored
      Currently, the code that performs CoW remapping after a write has this
      odd behavior where it walks /backwards/ through the data fork to remap
      extents in reverse order.  Earlier, we rewrote the reflink remap
      function to use deferred bmap log items instead of trying to cram as
      much into the first transaction that we could.  Now do the same for the
      CoW remap code.  There doesn't seem to be any performance impact; we're
      just making better use of code that we added for the benefit of reflink.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      df2fd88f
    • Darrick J. Wong's avatar
      xfs: reduce transaction reservations with reflink · b037c4ee
      Darrick J. Wong authored
      Before to the introduction of deferred refcount operations, reflink
      would try to cram refcount btree updates into the same transaction as an
      allocation or a free event.  Mainline XFS has never actually done that,
      but we never refactored the transaction reservations to reflect that we
      now do all refcount updates in separate transactions.  Fix this to
      reduce the transaction reservation size even farther, so that between
      this patch and the previous one, we reduce the tr_write and tr_itruncate
      sizes by 66%.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      b037c4ee
    • Darrick J. Wong's avatar
      xfs: reduce the absurdly large log operation count · 4ecf9e7c
      Darrick J. Wong authored
      Back in the early days of reflink and rmap development I set the
      transaction reservation sizes to be overly generous for rmap+reflink
      filesystems, and a little under-generous for rmap-only filesystems.
      
      Since we don't need *eight* transaction rolls to handle three new log
      intent items, decrease the logcounts to what we actually need, and amend
      the shadow reservation computation function to reflect what we used to
      do so that the minimum log size doesn't change.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4ecf9e7c
    • Darrick J. Wong's avatar
      xfs: report "max_resp" used for min log size computation · 918247ce
      Darrick J. Wong authored
      Move the tracepoint that computes the size of the transaction used to
      compute the minimum log size into xfs_log_get_max_trans_res so that we
      only have to compute this stuff once.
      
      Leave xfs_log_get_max_trans_res as a non-static function so that xfs_db
      can call it to report the results of the userspace computation of the
      same value to diagnose mkfs/kernel misinteractions.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      918247ce
    • Darrick J. Wong's avatar
      xfs: create shadow transaction reservations for computing minimum log size · 52d8ea4f
      Darrick J. Wong authored
      Every time someone changes the transaction reservation sizes, they
      introduce potential compatibility problems if the changes affect the
      minimum log size that we validate at mount time.  If the minimum log
      size gets larger (which should be avoided because doing so presents a
      serious risk of log livelock), filesystems created with old mkfs will
      not mount on a newer kernel; if the minimum size shrinks, filesystems
      created with newer mkfs will not mount on older kernels.
      
      Therefore, enable the creation of a shadow log reservation structure
      where we can "undo" the effects of tweaks when computing minimum log
      sizes.  These shadow reservations should never be used in practice, but
      they insulate us from perturbations in minimum log size.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      52d8ea4f
    • Darrick J. Wong's avatar
      xfs: remove a __xfs_bunmapi call from reflink · f1e6a8d7
      Darrick J. Wong authored
      This raw call isn't necessary since we can always remove a full delalloc
      extent.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      f1e6a8d7
    • Darrick J. Wong's avatar
      xfs: stop artificially limiting the length of bunmap calls · 4ed6435c
      Darrick J. Wong authored
      In commit e1a4e37c, we clamped the length of bunmapi calls on the
      data forks of shared files to avoid two failure scenarios: one where the
      extent being unmapped is so sparsely shared that we exceed the
      transaction reservation with the sheer number of refcount btree updates
      and EFI intent items; and the other where we attach so many deferred
      updates to the transaction that we pin the log tail and later the log
      head meets the tail, causing the log to livelock.
      
      We avoid triggering the first problem by tracking the number of ops in
      the refcount btree cursor and forcing a requeue of the refcount intent
      item any time we think that we might be close to overflowing.  This has
      been baked into XFS since before the original e1a4 patch.
      
      A recent patchset fixed the second problem by changing the deferred ops
      code to finish all the work items created by each round of trying to
      complete a refcount intent item, which eliminates the long chains of
      deferred items (27dad); and causing long-running transactions to relog
      their intent log items when space in the log gets low (74f4d).
      
      Because this clamp affects /any/ unmapping request regardless of the
      sharing factors of the component blocks, it degrades the performance of
      all large unmapping requests -- whereas with an unshared file we can
      unmap millions of blocks in one go, shared files are limited to
      unmapping a few thousand blocks at a time, which causes the upper level
      code to spin in a bunmapi loop even if it wasn't needed.
      
      This also eliminates one more place where log recovery behavior can
      differ from online behavior, because bunmapi operations no longer need
      to requeue.  The fstest generic/447 was created to test the old fix, and
      it still passes with this applied.
      
      Partial-revert-of: e1a4e37c ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent")
      Depends: 27dada07 ("xfs: change the order in which child and parent defer ops ar finished")
      Depends: 74f4d6a1 ("xfs: only relog deferred intent items if free space in the log gets low")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4ed6435c
    • Darrick J. Wong's avatar
      xfs: count EFIs when deciding to ask for a continuation of a refcount update · c47260d4
      Darrick J. Wong authored
      A long time ago, I added to XFS the ability to use deferred reference
      count operations as part of a transaction chain.  This enabled us to
      avoid blowing out the transaction reservation when the blocks in a
      physical extent all had different reference counts because we could ask
      the deferred operation manager for a continuation, which would get us a
      clean transaction.
      
      The refcount code asks for a continuation when the number of refcount
      record updates reaches the point where we think that the transaction has
      logged enough full btree blocks due to refcount (and free space) btree
      shape changes and refcount record updates that we're in danger of
      overflowing the transaction.
      
      We did not previously count the EFIs logged to the refcount update
      transaction because the clamps on the length of a bunmap operation were
      sufficient to avoid overflowing the transaction reservation even in the
      worst case situation where every other block of the unmapped extent is
      shared.
      
      Unfortunately, the restrictions on bunmap length avoid failure in the
      worst case by imposing a maximum unmap length of ~3000 blocks, even for
      non-pathological cases.  This seriously limits performance when freeing
      large extents.
      
      Therefore, track EFIs with the same counter as refcount record updates,
      and use that information as input into when we should ask for a
      continuation.  This enables the next patch to drop the clumsy bunmap
      limitation.
      
      Depends: 27dada07 ("xfs: change the order in which child and parent defer ops ar finished")
      Depends: 74f4d6a1 ("xfs: only relog deferred intent items if free space in the log gets low")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      c47260d4
    • Darrick J. Wong's avatar
      xfs: speed up write operations by using non-overlapped lookups when possible · 1edf8056
      Darrick J. Wong authored
      Reverse mapping on a reflink-capable filesystem has some pretty high
      overhead when performing file operations.  This is because the rmap
      records for logically and physically adjacent extents might not be
      adjacent in the rmap index due to data block sharing.  As a result, we
      use expensive overlapped-interval btree search, which walks every record
      that overlaps with the supplied key in the hopes of finding the record.
      
      However, profiling data shows that when the index contains a record that
      is an exact match for a query key, the non-overlapped btree search
      function can find the record much faster than the overlapped version.
      Try the non-overlapped lookup first when we're trying to find the left
      neighbor rmap record for a given file mapping, which makes unwritten
      extent conversion and remap operations run faster if data block sharing
      is minimal in this part of the filesystem.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      1edf8056
  3. 27 Apr, 2022 3 commits
  4. 26 Apr, 2022 3 commits
  5. 21 Apr, 2022 3 commits
    • Dave Chinner's avatar
      Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next · a44a027a
      Dave Chinner authored
      xfs: Large extent counters
      
      The commit xfs: fix inode fork extent count overflow
      (3f8a4f1d) mentions that 10 billion
      data fork extents should be possible to create. However the
      corresponding on-disk field has a signed 32-bit type. Hence this
      patchset extends the per-inode data fork extent counter to 64 bits
      (out of which 48 bits are used to store the extent count).
      
      Also, XFS has an attribute fork extent counter which is 16 bits
      wide. A workload that,
      1. Creates 1 million 255-byte sized xattrs,
      2. Deletes 50% of these xattrs in an alternating manner,
      3. Tries to insert 400,000 new 255-byte sized xattrs
         causes the xattr extent counter to overflow.
      
      Dave tells me that there are instances where a single file has more
      than 100 million hardlinks. With parent pointers being stored in
      xattrs, we will overflow the signed 16-bits wide attribute extent
      counter when large number of hardlinks are created. Hence this
      patchset extends the on-disk field to 32-bits.
      
      The following changes are made to accomplish this,
      1. A 64-bit inode field is carved out of existing di_pad and
         di_flushiter fields to hold the 64-bit data fork extent counter.
      2. The existing 32-bit inode data fork extent counter will be used to
         hold the attribute fork extent counter.
      3. A new incompat superblock flag to prevent older kernels from mounting
         the filesystem.
      Signed-off-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a44a027a
    • Dave Chinner's avatar
    • Dave Chinner's avatar