- 03 Mar, 2014 1 commit
-
-
Steven Whitehouse authored
This patch fixes a long standing issue in mapping the journal extents. Most journals will consist of only a single extent, and although the cache took account of that by merging extents, it did not actually map large extents, but instead was doing a block by block mapping. Since the journal was only being mapped on mount, this was not normally noticeable. With the updated code, it is now possible to use the same extent mapping system during journal recovery (which will be added in a later patch). This will allow checking of the integrity of the journal before any reply of the journal content is attempted. For this reason the code is moving to bmap.c, since it will be used more widely in due course. An exercise left for the reader is to compare the new function gfs2_map_journal_extents() with gfs2_write_alloc_required() Additionally, should there be a failure, the error reporting is also updated to show more detail about what went wrong. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 14 Jan, 2014 1 commit
-
-
Steven Whitehouse authored
While investigating a rather strange bit of code in the quota clean up function, I spotted that the reason for its existence was that when remounting read only, we were not stopping the quotad thread, and thus it was possible for it to still have a reference to some of the quotas in that case. This patch moves the logd and quota thread start and stop into the make_fs_rw/ro functions, so that we now stop those threads when mounted read only. This means that quotad will always be stopped before we call the quota clean up function, and we can thus dispose of the (rather hackish) code that waits for it to give up its reference on the quotas. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
- 27 Sep, 2013 1 commit
-
-
Steven Whitehouse authored
The reservation for an inode should be cleared when it is truncated so that we can start again at a different offset for future allocations. We could try and do better than that, by resetting the search based on where the truncation started from, but this is only a first step. In addition, there are three callers of gfs2_rs_delete() but only one of those should really be testing the value of i_writecount. While we get away with that in the other cases currently, I think it would be better if we made that test specific to the one case which requires it. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 03 Jun, 2013 1 commit
-
-
Bob Peterson authored
This patch makes GFS2 immediately reclaim/delete all iopen glocks as soon as they're dequeued. This allows deleters to get an EXclusive lock on iopen so files are deleted properly instead of being set as unlinked. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 08 Apr, 2013 1 commit
-
-
Bob Peterson authored
The functions that delete block reservations from the rgrp block reservations rbtree no longer use the ip parameter. This patch eliminates the parameter. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 13 Feb, 2013 2 commits
-
-
Eric W. Biederman authored
When reading dinodes from the disk convert uids and gids into kuids and kgids to store in vfs data structures. When writing to dinodes to the disk convert kuids and kgids in the in memory structures into plain uids and gids. For now all on disk data structures are assumed to be stored in the initial user namespace. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
Eric W. Biederman authored
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE so the constants may be well typed. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- 29 Jan, 2013 3 commits
-
-
Steven Whitehouse authored
Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
The freeze code has not been looked at a lot recently. Upstream has moved on, and this is an attempt to catch us back up again. There is a vfs level interface for the freeze code which can be called from our (obsolete, but kept for backward compatibility purposes) sysfs freeze interface. This means freezing this way vs. doing it from the ioctl should now work in identical fashion. As a result of this, the freeze function is only called once and we can drop our own special purpose code for counting the number of freezes. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 07 Nov, 2012 1 commit
-
-
Benjamin Marzinski authored
file_accessed() was being called by gfs2_mmap() with a shared glock. If it needed to update the atime, it was crashing because it dirtied the inode in gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode() checked if the caller was already holding a glock, but it didn't make sure that the glock was in the exclusive state. Now, instead of calling file_accessed() while holding the shared lock in gfs2_mmap(), file_accessed() is called after grabbing and releasing the glock to update the inode. If file_accessed() needs to update the atime, it will grab an exclusive lock in gfs2_dirty_inode(). gfs2_dirty_inode() now also checks to make sure that if the calling process has already locked the glock, it has an exclusive lock. Signed-off-by:
Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 24 Sep, 2012 3 commits
-
-
Benjamin Marzinski authored
If a dirty GFS2 inode was being deleted but was in use by another node, its metadata was not getting written out before GFS2 checked for dirty buffers in gfs2_ail_flush(). GFS2 was relying on inode_go_sync() to write out the metadata when the other node tried to free the file, but it failed the error check before it got that far. This patch writes out the metadata before calling gfs2_ail_flush() Signed-off-by:
Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
The ->show_options() function for GFS2 was not correctly displaying the value when statfs slow in in use. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com> Reported-by:
Milos Jakubicek <xjakub@fi.muni.cz>
-
Steven Whitehouse authored
This patch introduces a new structure, gfs2_rbm, which is a tuple of a resource group, a bitmap within the resource group and an offset within that bitmap. This is designed to make manipulating these sets of variables easier. There is also a new helper function which converts this representation back to a disk block address. In addition, the rbtree nodes which are used for the reservations were not being correctly initialised, which is now fixed. Also, the tracing was not passing through the inode where it should have been. That is mostly fixed aside from one corner case. This needs to be revisited since there can also be a NULL rgrp in some cases which results in the device being incorrect in the trace. This is intended to be the first step towards cleaning up some of the allocation code, and some further bug fixes. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 20 Aug, 2012 1 commit
-
-
Tejun Heo authored
flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>
-
- 22 Jul, 2012 2 commits
-
-
Jan Kara authored
Since the moment writes to quota files are using block device page cache and space for quota structures is reserved at the moment they are first accessed we have no reason to sync quota before inode writeback. In fact this order is now only harmful since quota information can easily change during inode writeback (either because conversion of delayed-allocated extents or simply because of allocation of new blocks for simple filesystems not using page_mkwrite). So move syncing of quota information after writeback of inodes into ->sync_fs method. This way we do not have to use ->quota_sync callback which is primarily intended for use by quotactl syscall anyway and we get rid of calling ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which also support legacy non-journalled quotas) and thus there are no dirty quota structures. CC: "Theodore Ts'o" <tytso@mit.edu> CC: Joel Becker <jlbec@evilplan.org> CC: reiserfs-devel@vger.kernel.org Acked-by:
Steven Whitehouse <swhiteho@redhat.com> Acked-by:
Dave Kleikamp <shaggy@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jan Kara authored
Split off part of dquot_quota_sync() which writes dquots into a quota file to a separate function. In the next patch we will use the function from filesystems and we do not want to abuse ->quota_sync quotactl callback more than necessary. Acked-by:
Steven Whitehouse <swhiteho@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 19 Jul, 2012 1 commit
-
-
Bob Peterson authored
This patch reduces GFS2 file fragmentation by pre-reserving blocks. The resulting improved on disk layout greatly speeds up operations in cases which would have resulted in interlaced allocation of blocks previously. A typical example of this is 10 parallel dd processes, each writing to a file in a common dirctory. The implementation uses an rbtree of reservations attached to each resource group (and each inode). Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 08 Jun, 2012 1 commit
-
-
Benjamin Marzinski authored
Instead of reading in the resource groups when gfs2 is checking for free space to allocate from, gfs2 can store the necessary infromation in the resource group's lvb. Also, instead of searching for unlinked inodes in every resource group that's checked for free space, gfs2 can store the number of unlinked but inodes in the lvb, and only check for unlinked inodes if it will find some. The first time a resource group is locked, the lvb must initialized. Since this involves counting the unlinked inodes in the resource group, this takes a little extra time. But after that, if the resource group is locked with GL_SKIP, the buffer head won't be read in unless it's actually needed. Enabling the resource groups lvbs is done via the rgrplvb mount option. If this option isn't set, the lvbs will still be set and updated, but they won't be verfied or used by the filesystem. To safely turn on this option, all of the nodes mounting the filesystem must be running code with this patch, and the filesystem must have been completely unmounted since they were updated. Signed-off-by:
Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 06 Jun, 2012 2 commits
-
-
Bob Peterson authored
This patch moves the ancillary quota data structures into the block reservations structure. This saves GFS2 some time and effort in allocating and deallocating the qadata structure. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Bob Peterson authored
This patch lengthens the lifespan of the reservations structure for inodes. Before, they were allocated and deallocated for every write operation. With this patch, they are allocated when the first write occurs, and deallocated when the last process closes the file. It's more efficient to do it this way because it saves GFS2 a lot of unnecessary allocates and frees. It also gives us more flexibility for the future: (1) we can now fold the qadata structure back into the structure and save those alloc/frees, (2) we can use this for multi-block reservations. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 06 May, 2012 1 commit
-
-
Jan Kara authored
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Fengguang Wu <fengguang.wu@intel.com>
-
- 07 Mar, 2012 1 commit
-
-
Steven Whitehouse authored
This ensures that we will not try to access the inode thats being flushed via the glock after it has been freed. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 28 Feb, 2012 1 commit
-
-
Steven Whitehouse authored
The FITRIM ioctl provides an alternative way to send discard requests to the underlying device. Using the discard mount option results in every freed block generating a discard request to the block device. This can be slow, since many block devices can only process discard requests of larger sizes, and also such operations can be time consuming. Rather than using the discard mount option, FITRIM allows a sweep of the filesystem on an occasional basis, and also to optionally avoid sending down discard requests for smaller regions. In GFS2 FITRIM will work at resource group granularity. There is a flag for each resource group which keeps track of which resource groups have been trimmed. This flag is reset whenever a deallocation occurs in the resource group, and set whenever a successful FITRIM of that resource group has taken place. This helps to reduce repeated discard requests for the same block ranges, again improving performance. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 07 Jan, 2012 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 04 Jan, 2012 1 commit
-
-
Al Viro authored
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 22 Nov, 2011 1 commit
-
-
Bob Peterson authored
This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 21 Oct, 2011 6 commits
-
-
Steven Whitehouse authored
Unfortunately, it is not enough to just ignore locked buffers during the AIL flush from fsync. We need to be able to ignore all buffers which are locked, dirty or pinned at this stage as they might have been added subsequent to the log flush earlier in the fsync function. In addition, this means that we no longer need to rely on i_mutex to keep out writes during fsync, so we can, as a side-effect, remove that protection too. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com> Tested-By:
Abhijith Das <adas@redhat.com>
-
Steven Whitehouse authored
This means that after the initial allocation for any inode, the last used resource group is cached in the inode for future use. This drastically reduces the number of lookups of resource groups in the common case, and this the contention on that data structure. The allocation algorithm is the same as previously, except that we always check to see if the goal block is within the cached rgrp first before going to the rbtree to look one up. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
The aim of this patch is to use the newly enhanced ->dirty_inode() super block operation to deal with atime updates, rather than piggy backing that code into ->write_inode() as is currently done. The net result is a simplification of the code in various places and a reduction of the number of gfs2_dinode_out() calls since this is now implied by ->dirty_inode(). Some of the mark_inode_dirty() calls have been moved under glocks in order to take advantage of then being able to avoid locking in ->dirty_inode() when we already have suitable locks. One consequence is that generic_write_end() now correctly deals with file size updates, so that we do not need a separate check for that afterwards. This also, indirectly, means that fdatasync should work correctly on GFS2 - the current code always syncs the metadata whether it needs to or not. Has survived testing with postmark (with and without atime) and also fsx. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
If we have got far enough through the inode allocation code path that an inode has already been allocated, then we must call iput to dispose of it, if an error occurs during a later part of the process. This will always be the final iput since there will be no other references to the inode. Unlike when the inode has been unlinked, its block state will be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need to skip the test in ->evict_inode() for this one case in order to ensure that it will be deallocated correctly. This patch adds a new flag in order to ensure that this will happen correctly. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
We do not need to start a transaction unless the atime check has proved positive. Also if we are going to flush the complete ail list anyway, we might as well skip the writeback for this specific inode's metadata, since that will be done as part of the ail writeback process in an order offering potentially more efficient I/O. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 15 Jul, 2011 1 commit
-
-
Steven Whitehouse authored
This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 14 Jul, 2011 1 commit
-
-
Steven Whitehouse authored
This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choos...
-
- 09 May, 2011 2 commits
-
-
Steven Whitehouse authored
Now inode.c is empty. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
This function was intended for debugging purposes, but it is not very useful. If we want to know what is on disk then all we need is a block number and gfs2_edit can give us much better information about what is there. Otherwise, if we are interested in what is stored in the in-core inode, it doesn't help us out there either. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
- 20 Apr, 2011 3 commits
-
-
Steven Whitehouse authored
This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-
Steven Whitehouse authored
Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com>
-