1. 21 Oct, 2011 1 commit
    • Steven Whitehouse's avatar
      GFS2: Use ->dirty_inode() · ab9bbda0
      Steven Whitehouse authored
      
      The aim of this patch is to use the newly enhanced ->dirty_inode()
      super block operation to deal with atime updates, rather than
      piggy backing that code into ->write_inode() as is currently
      done.
      
      The net result is a simplification of the code in various places
      and a reduction of the number of gfs2_dinode_out() calls since
      this is now implied by ->dirty_inode().
      
      Some of the mark_inode_dirty() calls have been moved under glocks
      in order to take advantage of then being able to avoid locking in
      ->dirty_inode() when we already have suitable locks.
      
      One consequence is that generic_write_end() now correctly deals
      with file size updates, so that we do not need a separate check
      for that afterwards. This also, indirectly, means that fdatasync
      should work correctly on GFS2 - the current code always syncs the
      metadata whether it needs to or not.
      
      Has survived testing with postmark (with and without atime) and
      also fsx.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      ab9bbda0
  2. 20 Jul, 2011 1 commit
  3. 13 May, 2011 1 commit
  4. 09 May, 2011 2 commits
  5. 20 Apr, 2011 2 commits
    • Steven Whitehouse's avatar
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse authored
      
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4667a0ec
    • Steven Whitehouse's avatar
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse authored
      
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
  6. 18 Apr, 2011 1 commit
    • Bob Peterson's avatar
      GFS2: filesystem hang caused by incorrect lock order · 44ad37d6
      Bob Peterson authored
      
      This patch fixes a deadlock in GFS2 where two processes are trying
      to reclaim an unlinked dinode:
      One holds the inode glock and calls gfs2_lookup_by_inum trying to look
      up the inode, which it can't, due to I_FREEING.  The other has set
      I_FREEING from vfs and is at the beginning of gfs2_delete_inode
      waiting for the glock, which is held by the first.  The solution is to
      add a new non_block parameter to the gfs2_iget function that causes it
      to return -ENOENT if the inode is being freed.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      44ad37d6
  7. 18 Jan, 2011 1 commit
    • Steven Whitehouse's avatar
      GFS2: Fix error path in gfs2_lookup_by_inum() · 24d9765f
      Steven Whitehouse authored
      
      In the (impossible, except if there is fs corruption) error path
      in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
      fails, it was leaving the function by calling iput() rather
      than iget_failed(). This would cause future lookups of the same
      inode to block forever.
      
      This patch fixes the problem by moving the call to gfs2_inode_refresh()
      into gfs2_inode_lookup() where iget_failed() is part of the error path
      already. Also this cleans up some unreachable code and makes
      gfs2_set_iop() static.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      24d9765f
  8. 07 Jan, 2011 1 commit
  9. 15 Nov, 2010 1 commit
    • Steven Whitehouse's avatar
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse authored
      
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  10. 20 Sep, 2010 2 commits
    • Benjamin Marzinski's avatar
      GFS2: fallocate support · 3921120e
      Benjamin Marzinski authored
      
      This patch adds support for fallocate to gfs2.  Since the gfs2 does not support
      uninitialized data blocks, it must write out zeros to all the blocks.  However,
      since it does not need to lock any pages to read from, gfs2 can write out the
      zero blocks much more efficiently.  On a moderately full filesystem, fallocate
      works around 5 times faster on average.  The fallocate call also allows gfs2 to
      add blocks to the file without changing the filesize, which will make it
      possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
      grow a completely full filesystem.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3921120e
    • Steven Whitehouse's avatar
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse authored
      
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
  11. 21 May, 2010 1 commit
  12. 14 Apr, 2010 1 commit
    • Bob Peterson's avatar
      GFS2: glock livelock · 1a0eae88
      Bob Peterson authored
      
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      1a0eae88
  13. 22 May, 2009 4 commits
  14. 15 Apr, 2009 1 commit
  15. 24 Mar, 2009 1 commit
    • Steven Whitehouse's avatar
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse authored
      
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  16. 05 Jan, 2009 2 commits
  17. 18 Sep, 2008 1 commit
    • Steven Whitehouse's avatar
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse authored
      
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  18. 27 Aug, 2008 1 commit
    • Steven Whitehouse's avatar
      GFS2: Fix & clean up GFS2 rename · 0188d6c5
      Steven Whitehouse authored
      
      This patch fixes a locking issue in the rename code by ensuring that we hold
      the per sb rename lock over both directory and "other" renames which involve
      different parent directories.
      
      At the same time, this moved the (only called from one place) function
      gfs2_ok_to_move into the file that its called from, so we can mark it
      static. This should make a code a bit easier to follow.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Peter Staubach <staubach@redhat.com>
      0188d6c5
  19. 27 Jul, 2008 1 commit
  20. 10 Jul, 2008 1 commit
  21. 03 Jul, 2008 1 commit
    • Miklos Szeredi's avatar
      [GFS2] don't call permission() · f58ba889
      Miklos Szeredi authored
      
      GFS2 calls permission() to verify permissions after locks on the files
      have been taken.
      
      For this it's sufficient to call gfs2_permission() instead.  This
      results in the following changes:
      
        - IS_RDONLY() check is not performed
        - IS_IMMUTABLE() check is not performed
        - devcgroup_inode_permission() is not called
        - security_inode_permission() is not called
      
      IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
      flag should provide protection against read-only remounts during
      operations.  do_gfs2_set_flags() has been fixed to perform
      mnt_want_write()/mnt_drop_write() to protect against remounting
      read-only.
      
      IS_IMMUTABLE has been added to gfs2_permission()
      
      Repeating the security checks seems to be pointless, as they don't
      normally change, and if they do, it's independent of the filesystem
      state.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      f58ba889
  22. 31 Mar, 2008 2 commits
    • Steven Whitehouse's avatar
      [GFS2] Eliminate (almost) duplicate field from gfs2_inode · 77658aad
      Steven Whitehouse authored
      
      The blocks counter is almost a duplicate of the i_blocks
      field in the VFS inode. The only difference is that i_blocks
      can be only 32bits long for 32bit arch without large single file
      support. Since GFS2 doesn't handle the non-large single file
      case (for 32 bit anyway) this adds a new config dependency on
      64BIT || LSF. This has always been the case, however we've never
      explicitly said so before.
      
      Even if we do add support for the non-LSF case, we will still
      not require this field to be duplicated since we will not be
      able to access oversized files anyway.
      
      So the net result of all this is that we shave 8 bytes from a gfs2_inode
      and get our config deps correct.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      77658aad
    • Steven Whitehouse's avatar
      [GFS2] Streamline indirect pointer tree height calculation · ecc30c79
      Steven Whitehouse authored
      
      This patch improves the calculation of the tree height in order to reduce
      the number of operations which are carried out on each call to gfs2_block_map.
      In the common case, we now make a single comparison, rather than calculating
      the required tree height from scratch each time. Also in the case that the
      tree does need some extra height, we start from the current height rather from
      zero when we work out what the new height ought to be.
      
      In addition the di_height field is moved into the inode proper and reduced
      in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      ecc30c79
  23. 25 Jan, 2008 2 commits
    • Steven Whitehouse's avatar
      [GFS2] Introduce gfs2_set_aops() · 5561093e
      Steven Whitehouse authored
      
      Just like ext3 we now have three sets of address space operations
      to cover the cases of writeback, ordered and journalled data
      writes. This means that the individual operations can now become
      less complicated as we are able to remove some of the tests for
      file data mode from the code.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5561093e
    • Steven Whitehouse's avatar
      [GFS2] Add gfs2_is_writeback() · bf36a713
      Steven Whitehouse authored
      
      This adds a function "gfs2_is_writeback()" along the lines of the
      existing "gfs2_is_jdata()" in order to clean up the code and make
      the various tests for the inode mode more obvious. It also fixes
      the PageChecked() logic where we were resetting the flag too early
      in the case of an error path.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      bf36a713
  24. 10 Oct, 2007 1 commit
    • Benjamin Marzinski's avatar
      [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed · 7a9f53b3
      Benjamin Marzinski authored
      
      There is a possible deadlock between two processes on the same node, where one
      process is deleting an inode, and another process is looking for allocated but
      unused inodes to delete in order to create more space.
      
      process A does an iput() on inode X, and it's i_count drops to 0. This causes
      iput_final() to be called, which puts an inode into state I_FREEING at
      generic_delete_inode(). There no point between when iput_final() is called, and
      when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
      set, no other process on that node can successfully look up that inode until
      the delete finishes.
      
      process B locks the the resource group for the same inode in get_local_rgrp(),
      which is called by gfs2_inplace_reserve_i()
      
      process A tries to lock the resource group for the inode in
      gfs2_dinode_dealloc(), but it's already locked by process B
      
      process B waits in find_inode for the inode to have the I_FREEING state cleared.
      
      Deadlock.
      
      This patch solves the problem by adding an alternative to gfs2_iget(),
      gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
      state.o The alternate test function is just like the original one, except that
      it fails if the inode is being freed, and sets a skipped flag. The alternate
      set function is just like the original, except that it fails if the skipped
      flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
      gfs2_iget().
      Signed-off-by: default avatarBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      7a9f53b3
  25. 09 Jul, 2007 4 commits
    • Wendy Cheng's avatar
      [GFS2] Remove i_mode passing from NFS File Handle · 35dcc52e
      Wendy Cheng authored
      
      GFS2 has been passing i_mode within NFS File Handle. Other than the
      wrong assumption that there is always room for this extra 16 bit value,
      the current gfs2_get_dentry doesn't really need the i_mode to work
      correctly. Note that GFS2 NFS code does go thru the same lookup code
      path as direct file access route (where the mode is obtained from name
      lookup) but gfs2_get_dentry() is coded for different purpose. It is not
      used during lookup time. It is part of the file access procedure call.
      When the call is invoked, if on-disk inode is not in-memory, it has to
      be read-in. This makes i_mode passing a useless overhead.
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      35dcc52e
    • Wendy Cheng's avatar
      [GFS2] Obtaining no_formal_ino from directory entry · bb9bcf06
      Wendy Cheng authored
      
      GFS2 lookup code doesn't ask for inode shared glock. This implies during
      in-memory inode creation for existing file, GFS2 will not disk-read in
      the inode contents. This leaves no_formal_ino un-initialized during
      lookup time. The un-initialized no_formal_ino is subsequently encoded
      into file handle. Clients will get ESTALE error whenever it tries to
      access these files.
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      bb9bcf06
    • Steven Whitehouse's avatar
      [GFS2] Fix sign problem in quota/statfs and cleanup _host structures · bb8d8a6f
      Steven Whitehouse authored
      
      This patch fixes some sign issues which were accidentally introduced
      into the quota & statfs code during the endianess annotation process.
      Also included is a general clean up which moves all of the _host
      structures out of gfs2_ondisk.h (where they should not have been to
      start with) and into the places where they are actually used (often only
      one place). Also those _host structures which are not required any more
      are removed entirely (which is the eventual plan for all of them).
      
      The conversion routines from ondisk.c are also moved into the places
      where they are actually used, which for almost every one, was just one
      single place, so all those are now static functions. This also cleans up
      the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.
      
      The net result is a reduction of about 100 lines of code, many functions
      now marked static plus the bug fixes as mentioned above. For good
      measure I ran the code through sparse after making these changes to
      check that there are no warnings generated.
      
      This fixes Red Hat bz #239686
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      bb8d8a6f
    • Steven Whitehouse's avatar
      [GFS2] Clean up inode number handling · dbb7cae2
      Steven Whitehouse authored
      
      This patch cleans up the inode number handling code. The main difference
      is that instead of looking up the inodes using a struct gfs2_inum_host
      we now use just the no_addr member of this structure. The tests relating
      to no_formal_ino can then be done by the calling code. This has
      advantages in that we want to do different things in different code
      paths if the no_formal_ino doesn't match. In the NFS patch we want to
      return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
      no_formal_ino doesn't match and thus we can withdraw in this case.
      
      In order to later fix bz #201012, we need to be able to look up an inode
      without knowing no_formal_ino, as the only information that is known to
      us is the on-disk location of the inode in question.
      
      This patch will also help us to fix bz #236099 at a later date by
      cleaning up a lot of the code in that area.
      
      There are no user visible changes as a result of this patch and there
      are no changes to the on-disk format either.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      dbb7cae2
  26. 05 Feb, 2007 3 commits
    • Adrian Bunk's avatar
      [GFS2] make gfs2_change_nlink_i() static · 03dc6a53
      Adrian Bunk authored
      
      On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
      >...
      > Changes since 2.6.20-rc3-mm1:
      >...
      >  git-gfs2-nmw.patch
      >...
      >  git trees
      >...
      
      This patch makes the needlessly globlal gfs2_change_nlink_i() static.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      03dc6a53
    • S. Wendy Cheng's avatar
      [GFS2] Fix gfs2_rename deadlock · 87d21e07
      S. Wendy Cheng authored
      
      Second round of gfs2_rename lock re-ordering to allow Anaconda adding
      root partition on top of gfs2. Previous to this patch the recursive
      lock detector in glock.c can be triggered due to attempting to lock
      the rgrp twice. This fixes it by checking to see whether the rgrp
      is already locked.
      
      This fixes Red Hat bugzilla #221237
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      87d21e07
    • S. Wendy Cheng's avatar
      [GFS2] Fix change nlink deadlock · 5509826f
      S. Wendy Cheng authored
      
      Bugzilla 215088
      
      Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
      partition. The gfs2_rename() apparently needs block allocation for the
      new name (into the directory) where it requires rg locks. At the same
      time, while updating the nlink count for the replaced file,
      gfs2_change_nlink() tries to return the inode meta-data back to resource
      group where it needs rg locks too. Our logic doesn't allow process to
      acquire these locks recursively by the same process  (RHEL installer)
      that results a BUG call. This only happens within rename code path and
      only if the destination file exists before the rename operation.
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5509826f