1. 05 Feb, 2007 16 commits
    • S. Wendy Cheng's avatar
      [GFS2] Fix gfs2_rename deadlock · 87d21e07
      S. Wendy Cheng authored
      Second round of gfs2_rename lock re-ordering to allow Anaconda adding
      root partition on top of gfs2. Previous to this patch the recursive
      lock detector in glock.c can be triggered due to attempting to lock
      the rgrp twice. This fixes it by checking to see whether the rgrp
      is already locked.
      
      This fixes Red Hat bugzilla #221237
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      87d21e07
    • Russell Cattelan's avatar
      [GFS2] BZ 217008 fsfuzzer fix. · 6c93fd1e
      Russell Cattelan authored
      Update the quilt header comments to match the
      code changes.
      
      Change gfs2_lookup_simple to return an error in the case
      of a NULL inode.
      The callers of gfs2_lookup_simple do not check for NULL
      in the no entry case and such would end up dereferencing a NULL ptr.
      
      This fixes:
      http://projects.info-pull.com/mokb/MOKB-15-11-2006.htmlSigned-off-by: default avatarRussell Cattelan <cattelan@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      6c93fd1e
    • Steven Whitehouse's avatar
      [GFS2] Fix ordering of page disposal vs. glock_dq · 49686f71
      Steven Whitehouse authored
      In case of unlinked files with dirty pages GFS2 wasn't clearing
      the pages in quite the right order. This patch clears the pages
      earlier (before the qlock_dq) to avoid the situation that the
      release of the glock results in attempting to write back data that
      has already been deallocated.
      
      This fixes Red Hat bugzilla: #220117
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      49686f71
    • Patrick Caulfield's avatar
      [DLM] Fix spin lock already unlocked bug · 4edde74e
      Patrick Caulfield authored
      I just noticed this message when testing some other changes I'd made to
      lowcomms (to use workqueues) but the problem seems to be in the current
      git trees too. I'm amazed no-one has seen it.
      
          BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4edde74e
    • Patrick Caulfield's avatar
      [DLM] Fix schedule() calls · 3fb4a251
      Patrick Caulfield authored
      I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton.
      
      These four should really be calls to schedule() or the dlm can busy-wait.
      Signed-Off-By: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      3fb4a251
    • S. Wendy Cheng's avatar
      [GFS2] Fix change nlink deadlock · 5509826f
      S. Wendy Cheng authored
      Bugzilla 215088
      
      Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
      partition. The gfs2_rename() apparently needs block allocation for the
      new name (into the directory) where it requires rg locks. At the same
      time, while updating the nlink count for the replaced file,
      gfs2_change_nlink() tries to return the inode meta-data back to resource
      group where it needs rg locks too. Our logic doesn't allow process to
      acquire these locks recursively by the same process  (RHEL installer)
      that results a BUG call. This only happens within rename code path and
      only if the destination file exists before the rename operation.
      Signed-off-by: default avatarS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      5509826f
    • Steven Whitehouse's avatar
      [GFS2] Fail over to readpage for stuffed files · e1d5b18a
      Steven Whitehouse authored
      This is partially derrived from a patch written by Russell Cattelan.
      It fixes a bug where there is a race between readpages and truncate
      by ignoring readpages for stuffed files. This is ok because a stuffed
      file will never be more than one block (minus sizeof(struct gfs2_dinode))
      in size and block size is always less than page size, so we do not lose
      anything efficiency-wise by not doing readahead for stuffed files. They
      will have already been "read ahead" by the action of reading the inode
      in, in the first place.
      
      This is the remaining part of the fix for Red Hat bugzilla #218966
      which had not yet made it upstream.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Russell Cattelan <cattelan@redhat.com>
      e1d5b18a
    • Steven Whitehouse's avatar
      [GFS2] Fix DIO deadlock · c7b33834
      Steven Whitehouse authored
      This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
      due to trying to take the i_mutex while holding a glock. The correct
      locking order is defined as i_mutex -> glock in all cases.
      
      I've left dealing with allocating writes. I know that we need to do
      that, but for now this should do the trick. We don't need to take the
      i_mutex on write, because the VFS has already taken it for us. On read
      we don't need it since the glock is enough protection. The reason that
      I've made some of the checks into a separate function is that we'll need
      to do the checks again in the allocating write case eventually, so this
      is partly in preparation for this. Likewise the return value test of !=
      1 might look a bit odd and thats because we'll need a third return value
      in case of requiring an allocation.
      
      I've made the change to deferred mode on the glock to ensure flushing
      read caches on other nodes. I notice that (using blktrace to look at
      whats going on) we appear to do a better job of large I/Os than ext3
      after this patch (in terms of not splitting up the I/Os).
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Wendy Cheng <wcheng@redhat.com>
      c7b33834
    • Adrian Bunk's avatar
      [DLM] fs/dlm/lowcomms-tcp.c: remove 2 functions · 927255f0
      Adrian Bunk authored
      Remove the following unused functions:
      
      - lowcomms_send_message()
      - lowcomms_max_buffer_size()
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      927255f0
    • David Teigland's avatar
      [DLM] fix lost flags in stub replies · 075529b5
      David Teigland authored
      When the dlm fakes an unlock/cancel reply from a failed node using a stub
      message struct, it wasn't setting the flags in the stub message.  So, in
      the process of receiving the fake message the lkb flags would be updated
      and cleared from the zero flags in the message.  The problem observed in
      tests was the loss of the USER flag which caused the dlm to think a user
      lock was a kernel lock and subsequently fail an assertion checking the
      validity of the ast/callback field.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      075529b5
    • David Teigland's avatar
      [DLM] fix receive_request() lvb copying · 8d07fd50
      David Teigland authored
      LVB's are not sent as part of new requests, but the code receiving the
      request was copying data into the lvb anyway.  The space in the message
      where it mistakenly thought the lvb lived actually contained the resource
      name, so it wound up incorrectly copying this name data into the lvb.  Fix
      is to just create the lvb, not copy junk into it.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      8d07fd50
    • David Teigland's avatar
      [DLM] fix send_args() lvb copying · da49f36f
      David Teigland authored
      The send_args() function is used to copy parameters into a message for a
      number different message types.  Only some of those types are set up
      beforehand (in create_message) to include space for sending lvb data.
      send_args was wrongly copying the lvb for all message types as long as the
      lock had an lvb.  This means that the lvb data was being written past the
      end of the message into unknown space.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      da49f36f
    • David Teigland's avatar
      [DLM] add version check · 9e971b71
      David Teigland authored
      Check if we receive a message from another lockspace member running a
      version of the dlm with an incompatible inter-node message protocol.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      9e971b71
    • David Teigland's avatar
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland authored
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
    • David Teigland's avatar
      [DLM] fix resend rcom lock · dc200a88
      David Teigland authored
      There's a chance the new master of resource hasn't learned it's the new
      master before another node sends it a lock during recovery.  The node
      sending the lock needs to resend if this happens.
      
      - A sends a master lookup for resource R to C
      - B sends a master lookup for resource R to C
      - C receives A's lookup, assigns A to be master of R and
        sends a reply back to A
      - C receives B's lookup and sends a reply back to B saying
        that A is the master
      - B receives lookup reply from C and sends its lock for R to A
      - A receives lock from B, doesn't think it's the master of R
        and sends an error back to B
      - A receives lookup reply from C and becomes master of R
      - B gets error back from A and resends its lock back to A
        (this resending is what this patch does)
      - A receives lock from B, it now sees it's the master of R
        and takes the lock
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      dc200a88
    • David Teigland's avatar
      [GFS2] don't try to lockfs after shutdown · c3780511
      David Teigland authored
      If an fs has already been shut down, a lockfs callback should do nothing.
      An fs that's been shut down can't acquire locks or do anything with
      respect to the cluster.
      
      Also, remove FIXME comment in withdraw function.  The missing bits of the
      withdraw procedure are now all done by user space.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      c3780511
  2. 04 Feb, 2007 3 commits
  3. 03 Feb, 2007 11 commits
  4. 02 Feb, 2007 10 commits