- 20 Jul, 2016 23 commits
-
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
-
Dave Chinner authored
-
Christoph Hellwig authored
Instead we always declare struct xfs_dir2_sf_hdr as packed. That's the expected layout, and while most major architectures do the packing by default the new structure size and offset checker showed that not only the ARM old ABI got this wrong, but various minor embedded architectures did as well. [Verified that no code change on x86-64 results from this change] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
And use an array of unsigned char values directly to avoid problems with architectures that pad the size of structures. This also gets rid of the xfs_dir2_ino4_t and xfs_dir2_ino8_t types, and introduces new constants for the size of 4 and 8 bytes as well as the size difference between the two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Just use an array of two unsigned chars directly to avoid problems with architectures that pad the size of structures. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
So far the DAX code overloaded the direct I/O code path. There is very little in common between the two, and untangling them allows to clean up both variants. As a side effect we also get separate trace points for both I/O types. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
We control both the callers and callees of ->direct_IO, so remove the indirect calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
XFS already implement it's own flushing of the pagecache because it implements proper synchronization for direct I/O reads. This means calling generic_file_read_iter for direct I/O is rather useless, as it doesn't do much but updating the atime and iocb position for us. This also gets rid of the buffered I/O fallback that isn't used for XFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Similar to what we did on the write side a while ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
All the three low-level read implementations that we might call already take care of not overflowing the maximum supported bytes, no need to duplicate it here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Now that we have the direct I/O kiocb flag there is no real need to sample the value inside of XFS, and the invis flag was always just partially used and isn't worth keeping this infrastructure around for. This also splits the read tracepoint into buffered vs direct as we've done for writes a long time ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Instead check the file pointer for the invisble I/O flag directly, and use the chance to drop redundant arguments from the xfs_ioc_space prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
Newly allocated XFS metadata buffers are added to the LRU once the hold count is released, which typically occurs after I/O completion. There is no other mechanism at current that tracks the existence or I/O state of a new buffer. Further, readahead I/O tends to be submitted asynchronously by nature, which means the I/O can remain in flight and actually complete long after the calling context is gone. This means that file descriptors or any other holds on the filesystem can be released, allowing the filesystem to be unmounted while I/O is still in flight. When I/O completion occurs, core data structures may have been freed, causing completion to run into invalid memory accesses and likely to panic. This problem is reproduced on XFS via directory readahead. A filesystem is mounted, a directory is opened/closed and the filesystem immediately unmounted. The open/close cycle triggers a directory readahead that if delayed long enough, runs buffer I/O completion after the unmount has completed. To address this problem, add a mechanism to track all in-flight, asynchronous buffers using per-cpu counters in the buftarg. The buffer is accounted on the first I/O submission after the current reference is acquired and unaccounted once the buffer is returned to the LRU or freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before walking the LRU list. Once in-flight I/O has completed and the workqueue has drained, all new buffers should have been released onto the LRU. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
The upcoming buftarg I/O accounting mechanism maintains a count of all buffers that have undergone I/O in the current hold-release cycle. Certain buffers associated with core infrastructure (e.g., the xfs_mount superblock buffer, log buffers) are never released, however. This means that accounting I/O submission on such buffers elevates the buftarg count indefinitely and could lead to lockup on unmount. Define a new buffer flag to explicitly exclude buffers from buftarg I/O accounting. Set the flag on the superblock and associated log buffers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
With the code as it stands today, b_retries never increments because it gets reset to 0 in the error callback. Remove that, and fix a similar problem where the first retry time was constantly being overwritten, which defeated the timeout tunable as well. We now only set first retry time if a non-zero timeout is set, to match the behavior of only incrementing retries if a retry value is set. This way max retries & timeouts consistently take effect after a tunable is set, rather than acting retroactively on a buffer which has failed at some point in the past and has accumulated state from those prior failures. Thanks to dchinner for talking through this with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
Fix up a couple places where extra flag manipulation occurs. In the first case we clear XBF_ASYNC and then immediately reset it - so don't bother clearing in the first place. In the 2nd case we are at a point in the function where the buffer must already be async, so there is no need to reset it. Add consistent spacing around the " | " while we're at it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
xfs_error_get_cfg() is called with bp->b_error as an arg, which is negative, so the switch statement won't ever find any matches. This results in only the default error handler having any effect, as EIO/ENOSPC/ENODEV get ignored due to the wrong sign. It seems simplest to always flip the error sign to positive, so that we can handle either negative errors in bp->b_error, or possibly a positive errno via something like xfs_error_get_cfg(EIO) - this future-proofs the function. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Hou Tao authored
replace the magic numbers by offsetof(...) and sizeof(...), and add two extra checks on xfs_check_ondisk_structs() [dchinner: renamed header structures to be more descriptive] Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Kaho Ng authored
The indentation in this function is different from the other functions. Those spacebars are converted to tabs to improve readability. Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dan Carpenter authored
Errors go from zero which means no error to XFS_ERRTAG_MAX (22). My static checker complains that xfs_errortag_add() puts an upper bound on this but not a lower bound. Let's fix it by making it unsigned. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Jann Horn authored
When calling fdget() in xfs_ioc_swapext(), we need to verify that the file descriptors passed into the ioctl point to XFS inodes before we start operations on them. If we don't do this, we could be referencing arbitrary kernel memory as an XFS inode. THis could lead to memory corruption and/or performing locking operations on attacker-chosen structures in kernel memory. [dchinner: rewrite commit message ] [dchinner: add comment explaining new check ] Signed-off-by: Jann Horn <jann@thejh.net> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 21 Jun, 2016 14 commits
-
-
Dave Chinner authored
-
Darrick J. Wong authored
Create a common function to calculate the maximum height of a per-AG btree. This will eventually be used by the rmapbt and refcountbt code to calculate appropriate maxlevels values for each. This is important because the verifiers and the transaction block reservations depend on accurate estimates of how many blocks are needed to satisfy a btree split. We were mistakenly using the max bnobt height for all the btrees, which creates a dangerous situation since the larger records and keys in an rmapbt make it very possible that the rmapbt will be taller than the bnobt and so we can run out of transaction block reservation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
In struct xfs_bmap_free, convert the open-coded free extent list to a regular list, then use list_sort to sort it prior to processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Break up xfs_free_extent() into a helper that fixes the freelist. This helper will be used subsequently to ensure the freelist during deferred rmap processing. [darrick: refactor to put this at the head of the patchset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
This is already in xfsprogs' libxfs, so port it to the kernel. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Currently we don't check the error_tag when someone's trying to set up error injection testing. If userspace passes in a value we don't know about, send back an error. This will help xfstests to _notrun a test that uses error injection to test things like log replay. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Create a second buf_trylock tracepoint so that we can distinguish between a successful and a failed trylock. With this piece, we can use a script to look at the ftrace output to detect buffer deadlocks. [dchinner: update to if/else as per hch's suggestion] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
Some of the directory/attr structures contain variable-length objects, so the enclosing structure doesn't have a meaningful fixed size at compile time. We can check the offsets of the members before the variable-length member, so do those. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
xfs_reserve_blocks() is responsible to update the XFS reserved block pool count at mount time or based on user request. When the caller requests to increase the reserve pool, blocks must be allocated from the global counters such that they are no longer available for general purpose use. If the requested reserve pool size is too large, XFS reserves what blocks are available. The implementation requires looking at the percpu counters and making an educated guess as to how many blocks to try and allocate from xfs_mod_fdblocks(), which can return -ENOSPC if the guess was not accurate due to counters being modified in parallel. xfs_reserve_blocks() retries the guess in this scenario until the allocation succeeds or it is determined that there is no space available in the fs. While not easily reproducible in the current form, the retry code doesn't actually work correctly if xfs_mod_fdblocks() actually fails. The problem is that the percpu calculations use the m_resblks counter to determine how many blocks to allocate, but unconditionally update m_resblks before the block allocation has actually succeeded. Therefore, if xfs_mod_fdblocks() fails, the code jumps to the retry label and uses the already updated m_resblks value to determine how many blocks to try and allocate. If the percpu counters previously suggested that the entire request was available, fdblocks_delta could end up set to 0. In that case, m_resblks is updated to the requested value, yet no blocks have been reserved at all. Refactor xfs_reserve_blocks() to use an explicit loop and make the code easier to follow. Since we have to drop the spinlock across the xfs_mod_fdblocks() call, use a delta value for m_resblks as well and only apply the delta once allocation succeeds. [dchinner: convert to do {} while() loop] Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
The filesystem quiesce sequence performs the operations necessary to drain all background work, push pending transactions through the log infrastructure and wait on I/O resulting from the final AIL push. We have had reports of remount,ro hangs in xfs_log_quiesce() -> xfs_wait_buftarg(), however, and some instrumentation code to detect transaction commits at this point in the quiesce sequence has inculpated the eofblocks background scanner as a cause. While higher level remount code generally prevents user modifications by the time the filesystem has made it to xfs_log_quiesce(), the background scanner may still be alive and can perform pending work at any time. If this occurs between the xfs_log_force() and xfs_wait_buftarg() calls within xfs_log_quiesce(), this can lead to an indefinite lockup in xfs_wait_buftarg(). To prevent this problem, cancel the background eofblocks scan worker during the remount read-only quiesce sequence. This suspends background trimming when a filesystem is remounted read-only. This is only done in the remount path because the freeze codepath has already locked out new transactions by the time the filesystem attempts to quiesce (and thus waiting on an active work item could deadlock). Kick the eofblocks worker to pick up where it left off once an fs is remounted back to read-write. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
-
Dave Chinner authored
-
Christoph Hellwig authored
Instead punch the whole first, and the use the our zeroing helper to punch out the edge blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
- 20 Jun, 2016 3 commits
-
-
Christoph Hellwig authored
We now skip holes in it, so no need to have the caller do it as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
We'll want to use this code for large offsets now that we're skipping holes and unwritten extents efficiently. Also rename it to xfs_zero_range to be a bit more descriptive, and tell the caller if we actually did any zeroing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
-