- 30 Oct, 2008 40 commits
-
-
David Chinner authored
When copying lsn's from the log item to the inode or dquot flush lsn, we currently grab the AIL lock. We do this because the LSN is a 64 bit quantity and it needs to be read atomically. The lock is used to guarantee atomicity for 32 bit platforms. Make the LSN copying a small function, and make the function used conditional on BITS_PER_LONG so that 64 bit machines don't need to take the AIL lock in these places. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32349a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
With the new cursor interface, it makes sense to make all the traversing code use the cursor interface and make the old one go away. This means more of the AIL interfacing is done by passing struct xfs_ail pointers around the place instead of struct xfs_mount pointers. We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered() as it is only checking if the AIL is empty. We can do that with a call to xfs_trans_ail_tail() instead, where a zero LSN returned indicates and empty AIL... SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32348a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To replace the current generation number ensuring sanity of the AIL traversal, replace it with an external cursor that is linked to the AIL. Basically, we store the next item in the cursor whenever we want to drop the AIL lock to do something to the current item. When we regain the lock. the current item may already be free, so we can't reference it, but the next item in the traversal is already held in the cursor. When we move or delete an object, we search all the active cursors and if there is an item match we clear the cursor(s) that point to the object. This forces the traversal to restart transparently. We don't invalidate the cursor on insert because the cursor still points to a valid item. If the intem is inserted between the current item and the cursor it does not matter; the traversal is considered to be past the insertion point so it will be picked up in the next traversal. Hence traversal restarts pretty much disappear altogether with this method of traversal, which should substantially reduce the overhead of pushing on a busy AIL. Version 2 o add restart logic o comment cursor interface o minor cleanups SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32347a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Rather than embedding the struct xfs_ail in the struct xfs_mount, allocate it during AIL initialisation. Add a back pointer to the struct xfs_ail so that we can pass around the xfs_ail and still be able to access the xfs_mount if need be. This is th first step involved in isolating the AIL implementation from the surrounding filesystem code. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32346a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
When we create a directory, we reserve a number of blocks for the maximum possible expansion of of the directory due to various btree splits, freespace allocation, etc. Unfortunately, each allocation is not reflected in the total number of blocks still available to the transaction, so the maximal reservation is used over and over again. This leads to problems where an allocation group has only enough blocks for *some* of the allocations required for the directory modification. After the first N allocations, the remaining blocks in the allocation group drops below the total reservation, and subsequent allocations fail because the allocator will not allow the allocation to proceed if the AG does not have the enough blocks available for the entire allocation total. This results in an ENOSPC occurring after an allocation has already occurred. This results in aborting the directory operation (leaving the directory in an inconsistent state) and cancelling a dirty transaction, which results in a filesystem shutdown. Avoid the problem by reflecting the number of blocks allocated in any directory expansion in the total number of blocks available to the modification in progress. This prevents a directory modification from being aborted part way through with an ENOSPC. SGI-PV: 988144 SGI-Modid: xfs-linux-melb:xfs-kern:32340a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
If the last block of the AG has inodes in it and the AG is an exactly power-of-2 size then the last inode in the AG points to the last block in the AG. If we try to find the next inode in the AG by adding one to the inode number, we increment the inode number past the size of the AG. The result is that the macro XFS_INO_TO_AGINO() will strip the AG portion of the inode number and return an inode number of zero. That is, instead of terminating the lookup loop because we hit the inode number went outside the valid range for the AG, the search index returns to zero and we start traversing the radix tree from the start again. This results in an endless loop in xfs_sync_inodes_ag(). Fix it be detecting if the new search index decreases as a result of incrementing the current inode number. That indicate an overflow and hence that we have finished processing the AG so we can terminate the loop. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32335a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Now that the deleted inodes list is unused, kill it. This also removes the i_reclaim list head from the xfs_inode, shrinking it by two pointers. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32334a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Use the reclaim tag to walk the radix tree and find the inodes under reclaim. This was the only user of the deleted inode list. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32333a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Prepare for removing the deleted inode list by marking inodes for reclaim in the inode radix trees so that we can use the radix trees to find reclaimable inodes. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32331a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
The function names xfs_finish_reclaim and xfs_finish_reclaim_all are not very descriptive of what they are reclaiming. Rename to xfs_reclaim_inode[s] to match the xfs_sync_inodes() function. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32330a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Background inode reclaim is run by the xfssyncd. Move the reclaim worker functions to be close to the sync code as the are very similar in structure and are both run from the same background thread. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32329a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Lachlan McIlroy authored
SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32325a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
With the combined linux and XFS inode, we need to ensure that the combined structure is not freed before the generic code is finished with the inode. As it turns out, there is a case where the XFS inode is freed before the linux inode - when xfs_reclaim() is called from ->clear_inode() on a clean inode, the xfs inode is freed during that call. The generic code references the inode after the ->clear_inode() call, so this is a use after free situation. Fix the problem by moving the xfs_reclaim() call to ->destroy_inode() instead of in ->clear_inode(). This ensures the combined inode structure is not freed until after the generic code has finished with it. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32324a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 3 o kill xfs_icount() Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32323a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. After inode allocation and initialisation, we need to add the inode to the superblock list, the in-use list, hash it and do some accounting. This all needs to be done with the inode_lock held and there are already several places in fs/inode.c that do this list manipulation. Factor out the common code, add a locking wrapper and export the function so ti can be called from XFS. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. Factor and export the struct inode initialisation code from alloc_inode() to inode_init_always() as a counterpart to inode_init_once(). i.e. we have to call this init function for each inode instantiation (always), as opposed inode_init_once() which is only called on slab object instantiation (once). Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Once the Linux inode and the XFS inode are combined, we cannot rely on just check if the linux inode exists as a method of determining if it is valid or not. Hence we should always call xfs_mark_inode_dirty_sync() instead as it does the correct checks to determine if the liinux inode is in a valid state or not. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32318a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
There are really two cases in xfs_iget_core(). The first is the cache hit case, the second is the miss case. They share very little code, and hence can easily be factored out into separate functions. This makes the code much easier to understand and subsequently modify. SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32317a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Christoph Hellwig authored
We can only read inode->i_count if the inode is actually there and not a NULL pointer. This was introduced in one of the recent sync patches. SGI-PV: 988255 SGI-Modid: xfs-linux-melb:xfs-kern:32315a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
With all the other filesystem sync code it in xfs_sync.c including the data quiesce code, it makes sense to move the remaining quiesce code to the same place. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32312a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
There are no more callers to xfs_sync() now, so remove the function altogther. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32311a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
SYNC_CLOSE is only ever used and checked in conjunction with SYNC_WAIT, and this only done in one spot. The only thing this does is make XFS_bflush() calls to the data buftargs. This will happen very shortly afterwards the xfs_sync() call anyway in the unmount path via the xfs_close_devices(), so this code is redundant and can be removed. That only user of SYNC_CLOSE is now gone, so kill the flag completely. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32310a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Continue to de-multiplex xfs_sync be replacing all SYNC_DELWRI callers with direct calls functions that do the work. Isolate the data quiesce case to a function in xfs_sync.c. Isolate the FSDATA case with explicit calls to xfs_sync_fsdata(). Version 2: o Push delwri related log forces into xfs_sync_inodes(). SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32309a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Continue to de-multiplex xfs_sync be replacing all SYNC_ATTR callers with direct calls xfs_sync_inodes(). Add an assert into xfs_sync() to ensure we caught all the SYNC_ATTR callers. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32308a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Start de-multiplexing xfs_sync() by making xfs_sync_worker() call the specific sync functions it needs. This is only a small, unique subset of the entire xfs_sync() code so is easier to follow. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32307a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Now that the only caller is xfs_sync(), merge the two together as it makes no sense to keep them separate. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32306a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Kill the unused arg in xfs_syncsub() and xfs_sync_inodes(). For callers of xfs_syncsub() that only want to flush inodes, replace xfs_syncsub() with direct calls to xfs_sync_inodes() as that is all that is being done with the specific flags being passed in. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32305a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
With the sync code relocated to the linux-2.6 directory we can use struct inodes directly. If we do the same thing for the quota release code, we can remove vn_grab altogether. While here, convert the VN_BAD() checks to is_bad_inode() so we can remove vnodes entirely from this code. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32304a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Christoph Hellwig authored
Split out two helpers from xfs_syncsub for the dummy log commit and the superblock writeout. SGI-PV: 988140 SGI-Modid: xfs-linux-melb:xfs-kern:32303a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
Move the XFS_BMAP_SANITY_CHECK macro out of line and make it a properly typed function. Also pass the xfs_buf for the btree block instead of just the btree block header, as we will need some additional information for it to implement CRC checking of btree blocks. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32301a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
structures. Always use the generic xfs_btree_block type instead of the short / long structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for the length of a short / long form block. The rationale for this is that we will grow more btree block header variants to support CRCs and other RAS information, and always accessing them through the same datatype with unions for the short / long form pointers makes implementing this much easier. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32300a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
Christoph Hellwig authored
Replace the generic record / key / ptr addressing macros that use cpp token pasting with simpler macros that do the job for just one given btree type. The new macros lose the cur argument and thus can be used outside the core btree code, but also gain an xfs_mount * argument to allow for checking the CRC flag in the near future. Note that many of these macros aren't actually used in the kernel code, but only in userspace (mostly in xfs_repair). SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32295a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Now we've removed all users of the mount inode list, we can kill it. This reduces the size of the xfs_inode by 2 pointers. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32293a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
Christoph Hellwig authored
Clean up the way the maximum and minimum records for the btree blocks are calculated. For the alloc and inobt btrees all the values are pre-calculated in xfs_mount_common, and we switch the current loop around the ugly generic macros that use cpp token pasting to generate type names to two small helpers in normal C code. For the bmbt and bmdr trees these helpers also exist, but can be called during runtime, too. Here we also kill various macros dealing with them and inline the logic into the get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c. Note that all these new helpers take an xfs_mount * argument which will be needed to determine the size of a btree block once we add support for extended btree blocks with CRCs and other RAS information. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32292a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
Make releasing all inode dquots traverse the per-ag inode radix trees rather than the mount inode list. This removes another user of the mount inode list. Version 3 o fix comment relating to avoiding trying to release the quota inodes and those in reclaim. Version 2 o add comment explaining use of gang lookups for a single inode o use IRELE, not VN_RELE o move check for ag initialisation to caller. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32291a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Update xfs_sync_inodes to walk the inode radix tree cache to find dirty inodes. This removes a huge bunch of nasty, messy code for traversing the mount inode list safely and removes another user of the mount inode list. Version 3 o rediff against new linux-2.6/xfs_sync.c code Version 2 o add comment explaining use of gang lookups for a single inode o use IRELE, not VN_RELE o move check for ag initialisation to caller. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32290a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Normally dquots are written back via delayed write mechanisms. They are flushed to their backing buffer by xfssyncd, which is then pushed out by either AIL or xfsbufd flushing. The flush from the xfssyncd is supposed to be non-blocking, but xfs_qm_dqflush() always waits for pinned duots, which means that it will block for the length of time it takes to do a synchronous log force. This causes unnecessary extra log I/O to be issued whenever we try to flush a busy dquot. Avoid the log forces and blocking xfssyncd by making xfs_qm_dqflush() pay attention to what type of sync it is doing when it sees a pinned dquot and not waiting when doing non-blocking flushes. SGI-PV: 988147 SGI-Modid: xfs-linux-melb:xfs-kern:32287a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
-
David Chinner authored
xfs_iflush_all() walks the m_inodes list to find inodes that need reclaiming. We already have such a list - the m_del_inodes list. Replace xfs_iflush_all() with a call to xfs_finish_reclaim_all() and clean up xfs_finish_reclaim_all() to handle the different flush modes now needed. Originally based on a patch from Christoph Hellwig. Version 3 o rediff against new linux-2.6/xfs_sync.c code Version 2 o revert xfs_syncsub() inode reclaim behaviour back to original code o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not XFS_IFLUSH_ASYNC, to prevent change of behaviour. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32284a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
Move all the xfssyncd code to the new xfs_sync.c file. This places it closer to the actual code that it interacts with, rather than just being associated with high level VFS code. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32283a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-
David Chinner authored
The sync code in XFS is spread around several files. While it used to make sense to have such a distribution, the code is about to be cleaned up and so centralising it in one spot as the first step makes sense. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32282a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
-