- 28 Apr, 2022 9 commits
-
-
Darrick J. Wong authored
Currently, the code that performs CoW remapping after a write has this odd behavior where it walks /backwards/ through the data fork to remap extents in reverse order. Earlier, we rewrote the reflink remap function to use deferred bmap log items instead of trying to cram as much into the first transaction that we could. Now do the same for the CoW remap code. There doesn't seem to be any performance impact; we're just making better use of code that we added for the benefit of reflink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Before to the introduction of deferred refcount operations, reflink would try to cram refcount btree updates into the same transaction as an allocation or a free event. Mainline XFS has never actually done that, but we never refactored the transaction reservations to reflect that we now do all refcount updates in separate transactions. Fix this to reduce the transaction reservation size even farther, so that between this patch and the previous one, we reduce the tr_write and tr_itruncate sizes by 66%. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Back in the early days of reflink and rmap development I set the transaction reservation sizes to be overly generous for rmap+reflink filesystems, and a little under-generous for rmap-only filesystems. Since we don't need *eight* transaction rolls to handle three new log intent items, decrease the logcounts to what we actually need, and amend the shadow reservation computation function to reflect what we used to do so that the minimum log size doesn't change. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Move the tracepoint that computes the size of the transaction used to compute the minimum log size into xfs_log_get_max_trans_res so that we only have to compute this stuff once. Leave xfs_log_get_max_trans_res as a non-static function so that xfs_db can call it to report the results of the userspace computation of the same value to diagnose mkfs/kernel misinteractions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Every time someone changes the transaction reservation sizes, they introduce potential compatibility problems if the changes affect the minimum log size that we validate at mount time. If the minimum log size gets larger (which should be avoided because doing so presents a serious risk of log livelock), filesystems created with old mkfs will not mount on a newer kernel; if the minimum size shrinks, filesystems created with newer mkfs will not mount on older kernels. Therefore, enable the creation of a shadow log reservation structure where we can "undo" the effects of tweaks when computing minimum log sizes. These shadow reservations should never be used in practice, but they insulate us from perturbations in minimum log size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
This raw call isn't necessary since we can always remove a full delalloc extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
In commit e1a4e37c, we clamped the length of bunmapi calls on the data forks of shared files to avoid two failure scenarios: one where the extent being unmapped is so sparsely shared that we exceed the transaction reservation with the sheer number of refcount btree updates and EFI intent items; and the other where we attach so many deferred updates to the transaction that we pin the log tail and later the log head meets the tail, causing the log to livelock. We avoid triggering the first problem by tracking the number of ops in the refcount btree cursor and forcing a requeue of the refcount intent item any time we think that we might be close to overflowing. This has been baked into XFS since before the original e1a4 patch. A recent patchset fixed the second problem by changing the deferred ops code to finish all the work items created by each round of trying to complete a refcount intent item, which eliminates the long chains of deferred items (27dad); and causing long-running transactions to relog their intent log items when space in the log gets low (74f4d). Because this clamp affects /any/ unmapping request regardless of the sharing factors of the component blocks, it degrades the performance of all large unmapping requests -- whereas with an unshared file we can unmap millions of blocks in one go, shared files are limited to unmapping a few thousand blocks at a time, which causes the upper level code to spin in a bunmapi loop even if it wasn't needed. This also eliminates one more place where log recovery behavior can differ from online behavior, because bunmapi operations no longer need to requeue. The fstest generic/447 was created to test the old fix, and it still passes with this applied. Partial-revert-of: e1a4e37c ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent") Depends: 27dada07 ("xfs: change the order in which child and parent defer ops ar finished") Depends: 74f4d6a1 ("xfs: only relog deferred intent items if free space in the log gets low") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
A long time ago, I added to XFS the ability to use deferred reference count operations as part of a transaction chain. This enabled us to avoid blowing out the transaction reservation when the blocks in a physical extent all had different reference counts because we could ask the deferred operation manager for a continuation, which would get us a clean transaction. The refcount code asks for a continuation when the number of refcount record updates reaches the point where we think that the transaction has logged enough full btree blocks due to refcount (and free space) btree shape changes and refcount record updates that we're in danger of overflowing the transaction. We did not previously count the EFIs logged to the refcount update transaction because the clamps on the length of a bunmap operation were sufficient to avoid overflowing the transaction reservation even in the worst case situation where every other block of the unmapped extent is shared. Unfortunately, the restrictions on bunmap length avoid failure in the worst case by imposing a maximum unmap length of ~3000 blocks, even for non-pathological cases. This seriously limits performance when freeing large extents. Therefore, track EFIs with the same counter as refcount record updates, and use that information as input into when we should ask for a continuation. This enables the next patch to drop the clumsy bunmap limitation. Depends: 27dada07 ("xfs: change the order in which child and parent defer ops ar finished") Depends: 74f4d6a1 ("xfs: only relog deferred intent items if free space in the log gets low") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first when we're trying to find the left neighbor rmap record for a given file mapping, which makes unwritten extent conversion and remap operations run faster if data block sharing is minimal in this part of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
- 27 Apr, 2022 3 commits
-
-
Darrick J. Wong authored
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first, which will make scrub run much faster. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Most callers of xfs_rmap_lookup_le will retrieve the btree record immediately if the lookup succeeds. The overlapped version of this function (xfs_rmap_lookup_le_range) will return the record if the lookup succeeds, so make the regular version do it too. Get rid of the useless len argument, since it's not part of the lookup key. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Record the buffer ops in the xfs_buf tracepoints so that we can monitor the alleged type of the buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
- 21 Apr, 2022 28 commits
-
-
https://github.com/chandanr/linuxDave Chinner authored
xfs: Large extent counters The commit xfs: fix inode fork extent count overflow (3f8a4f1d) mentions that 10 billion data fork extents should be possible to create. However the corresponding on-disk field has a signed 32-bit type. Hence this patchset extends the per-inode data fork extent counter to 64 bits (out of which 48 bits are used to store the extent count). Also, XFS has an attribute fork extent counter which is 16 bits wide. A workload that, 1. Creates 1 million 255-byte sized xattrs, 2. Deletes 50% of these xattrs in an alternating manner, 3. Tries to insert 400,000 new 255-byte sized xattrs causes the xattr extent counter to overflow. Dave tells me that there are instances where a single file has more than 100 million hardlinks. With parent pointers being stored in xattrs, we will overflow the signed 16-bits wide attribute extent counter when large number of hardlinks are created. Hence this patchset extends the on-disk field to 32-bits. The following changes are made to accomplish this, 1. A 64-bit inode field is carved out of existing di_pad and di_flushiter fields to hold the 64-bit data fork extent counter. 2. The existing 32-bit inode data fork extent counter will be used to hold the attribute fork extent counter. 3. A new incompat superblock flag to prevent older kernels from mounting the filesystem. Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. We also pass the fields to log to xfs_btree_offsets() as a uint32_t all cases now. I have no idea why we made that parameter a int64_t in the first place, but while we are fixing this up change it to a uint32_t field, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This touches xfs_fs.h so affects the user API, but the user API fields are also unsigned so the flags should really be unsigned, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Now that we account for log opheaders in the log item formatting code, we don't actually use the aggregated count of log iovecs in the CIL for anything. Remove it and the tracking code that calculates it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
So remove it from the interface and callers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
The rework of xlog_write() no longer requires xlog_get_iclog_state() to tell it about internal iclog space reservation state to direct it on what to do. Remove this parameter. $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 26520 560 8 27088 69d0 fs/xfs/xfs_log.o.orig 26384 560 8 26952 6948 fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Just check that the offset in xlog_write_vec is smaller than the iclog size and remove the expensive cycling through all iclogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Re-implement writing of a log vector that does not fit into the current iclog. The iclog will already be in XLOG_STATE_WANT_SYNC because xlog_get_iclog_space() will have reserved all the remaining iclog space for us, hence we can simply iterate over the iovecs in the log vector getting more iclog space until the entire log vector is written. Handling this partial write case separately means we do need to pass unnecessary state around for the common, fast path case when the log vector fits entirely within the current iclog. It isolates the complexity and allows us to modify and improve the partial write case without impacting the simple fast path. This change includes several improvements incorporated from patches written by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Introduce an optimised version of xlog_write() that is used when the entire write will fit in a single iclog. This greatly simplifies the implementation of writing a log vector chain into an iclog, and sets the ground work for a much more understandable xlog_write() implementation. This incorporates some factoring and simplifications proposed by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Turn ic_datap from a char into a void pointer given that it points to arbitrary data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [dgc: also remove (char *) cast in xlog_alloc_log()] Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
The caller of xlog_write() usually has a close accounting of the aggregated vector length contained in the log vector chain passed to xlog_write(). There is no need to iterate the chain to calculate he length of the data in xlog_write_calculate_len() if the caller is already iterating that chain to build it. Passing in the vector length avoids doing an extra chain iteration, which can be a significant amount of work given that large CIL commits can have hundreds of thousands of vectors attached to the chain. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-