1. 26 Jul, 2010 9 commits
    • NeilBrown's avatar
      md/plug: optionally use plugger to unplug an array during resync/recovery. · 252ac522
      NeilBrown authored
      If an array doesn't have a 'queue' then md_do_sync cannot
      unplug it.
      In that case it will have a 'plugger', so make that available
      to the mddev, and use it to unplug the array if needed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      252ac522
    • NeilBrown's avatar
      md/raid5: add simple plugging infrastructure. · 2ac87401
      NeilBrown authored
      md/raid5 uses the plugging infrastructure provided by the block layer
      and 'struct request_queue'.  However when we plug raid5 under dm there
      is no request queue so we cannot use that.
      
      So create a similar infrastructure that is much lighter weight and use
      it for raid5.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      2ac87401
    • NeilBrown's avatar
      md/raid5: export is_congested test · 11d8a6e3
      NeilBrown authored
      the dm module will need this for dm-raid45.
      
      Also only access ->queue->backing_dev_info->congested_fn
      if ->queue actually exists.  It won't in a dm target.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      11d8a6e3
    • NeilBrown's avatar
      raid5: Don't set read-ahead when there is no queue · 4a5add49
      NeilBrown authored
      dm-raid456 does not provide a 'queue' for raid5 to use,
      so we must make raid5 stop depending on the queue.
      
      First: read_ahead
      dm handles read-ahead adjustment fully in userspace, so
      simply don't do any readahead adjustments if there is
      no queue.
      
      Also re-arrange code slightly so all the accesses to ->queue are
      together.
      
      Finally, move the blk_queue_merge_bvec function into the 'if' as
      the ->split_io setting in dm-raid456 has the same effect.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      4a5add49
    • NeilBrown's avatar
      md: add support for raising dm events. · 768a418d
      NeilBrown authored
      dm uses scheduled work to raise events to user-space.
      So allow md device to have work_structs and schedule them on an error.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      768a418d
    • NeilBrown's avatar
      md: export various start/stop interfaces · 390ee602
      NeilBrown authored
      export entry points for starting and stopping md arrays.
      This will be used by a module to make md/raid5 work under
      dm.
      Also stop calling md_stop_writes from md_stop, as that won't
      work well with dm - it will want to call the two separately.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      390ee602
    • NeilBrown's avatar
      md: split out md_rdev_init · e8bb9a83
      NeilBrown authored
      This functionality will be needed separately in a subsequent patch, so
      split it into it's own exported function.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e8bb9a83
    • NeilBrown's avatar
      md: be more careful setting MD_CHANGE_CLEAN · 676e42d8
      NeilBrown authored
      When MD_CHANGE_CLEAN is set we might block in md_write_start.
      So we should only set it when fairly sure that something will clear
      it.
      
      There are two places where it is set so as to encourage a metadata
      update to record the progress of resync/recovery.  This should only
      be done if the internal metadata update mechanisms are in use, which
      can be tested by by inspecting '->persistent'.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      676e42d8
    • NeilBrown's avatar
      md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk · f4be6b43
      NeilBrown authored
      We will shortly allow md devices with no gendisk (they are attached to
      a dm-target instead).  That will cause mdname() to return 'mdX'.
      There is one place where mdname really needs to be unique: when
      creating the name for a slab cache.
      So in that case, if there is no gendisk, you the address of the mddev
      formatted in HEX to provide a unique name.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f4be6b43
  2. 21 Jul, 2010 2 commits
  3. 19 Jul, 2010 9 commits
  4. 18 Jul, 2010 5 commits
  5. 16 Jul, 2010 9 commits
  6. 15 Jul, 2010 6 commits
    • Jan Kara's avatar
      jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09
      Jan Kara authored
      OCFS2 uses t_commit trigger to compute and store checksum of the just
      committed blocks. When a buffer has b_frozen_data, checksum is computed
      for it instead of b_data but this can result in an old checksum being
      written to the filesystem in the following scenario:
      
      1) transaction1 is opened
      2) handle1 is opened
      3) journal_access(handle1, bh)
          - This sets jh->b_transaction to transaction1
      4) modify(bh)
      5) journal_dirty(handle1, bh)
      6) handle1 is closed
      7) start committing transaction1, opening transaction2
      8) handle2 is opened
      9) journal_access(handle2, bh)
          - This copies off b_frozen_data to make it safe for transaction1 to commit.
            jh->b_next_transaction is set to transaction2.
      10) jbd2_journal_write_metadata() checksums b_frozen_data
      11) the journal correctly writes b_frozen_data to the disk journal
      12) handle2 is closed
          - There was no dirty call for the bh on handle2, so it is never queued for
            any more journal operation
      13) Checkpointing finally happens, and it just spools the bh via normal buffer
      writeback.  This will write b_data, which was never triggered on and thus
      contains a wrong (old) checksum.
      
      This patch fixes the problem by calling the trigger at the moment data is
      frozen for journal commit - i.e., either when b_frozen_data is created by
      do_get_write_access or just before we write a buffer to the log if
      b_frozen_data does not exist. We also rename the trigger to t_frozen as
      that better describes when it is called.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      13ceef09
    • Wengang Wang's avatar
      ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node · a39953dd
      Wengang Wang authored
      For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
      before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
      dlm_migration_can_proceed() for that purpose.  However, if the node is
      down, dlm_migration_can_proceed() will also return "go ahead".  In this
      rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
      the BUG_ON() that trips over this condition.
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      a39953dd
    • Tao Ma's avatar
      ocfs2: Don't duplicate pages past i_size during CoW. · f5e27b6d
      Tao Ma authored
      During CoW, the pages after i_size don't contain valid data, so there's
      no need to read and duplicate them.
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      f5e27b6d
    • Thomas Gleixner's avatar
      x86: Force HPET readback_cmp for all ATI chipsets · 08be9796
      Thomas Gleixner authored
      commit 30a564be (x86, hpet: Restrict read back to affected ATI
      chipset) restricted the workaround for the HPET bug to SMX00
      chipsets. This was reasonable as those were the only ones against
      which we ever got a bug report.
      
      Stephan Wolf reported now that this patch breaks his IXP400 based
      machine. Though it's confirmed to work on other IXP400 based systems.
      
      To error out on the safe side, we force the HPET readback workaround
      for all ATI SMbus class chipsets.
      Reported-by: default avatarStephan Wolf <stephan@letzte-bankreihe.de>
      LKML-Reference: <alpine.LFD.2.00.1007142134140.3321@localhost.localdomain>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarStephan Wolf <stephan@letzte-bankreihe.de>
      Acked-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      08be9796
    • Bob Peterson's avatar
      GFS2: rename causes kernel Oops · 728a756b
      Bob Peterson authored
      This patch fixes a kernel Oops in the GFS2 rename code.
      
      The problem was in the way the gfs2 directory code was trying
      to re-use sentinel directory entries.
      
      In the failing case, gfs2's rename function was renaming a
      file to another name that had the same non-trivial length.
      The file being renamed happened to be the first directory
      entry on the leaf block.
      
      First, the rename code (gfs2_rename in ops_inode.c) found the
      original directory entry and decided it could do its job by
      simply replacing the directory entry with another.  Therefore
      it determined correctly that no block allocations were needed.
      
      Next, the rename code deleted the old directory entry prior to
      replacing it with the new name.  Therefore, the soon-to-be
      replaced directory entry was temporarily made into a directory
      entry "sentinel" or a place holder at the start of a leaf block.
      
      Lastly, it went to re-add the replacement directory entry in
      that leaf block.  However, when gfs2_dirent_find_space was
      looking for space in the leaf block, it used the wrong value
      for the sentinel.  That threw off its calculations so later
      it decides it can't really re-use the sentinel and therefore
      must allocate a new leaf block.  But because it previously decided
      to re-use the directory entry, it didn't waste the time to
      grab a new block allocation for the inode.  Therefore, the
      inode's i_alloc pointer was still NULL and it crashes trying to
      reference it.
      
      In the case of sentinel directory entries, the entire dirent is
      reused, not just the "free space" portion of it, and therefore
      the function gfs2_dirent_find_space should use the value 0
      rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.
      
      Fixing this calculation enables the reproducer programs to work
      properly.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      728a756b
    • Abhijith Das's avatar
      GFS2: BUG in gfs2_adjust_quota · 8b421601
      Abhijith Das authored
      HighMem pages on i686 do not get mapped to the buffer_heads and this was
      causing a NULL pointer dereference when we were trying to memset page buffers
      to zero.
      We now use zero_user() that kmaps the page and directly manipulates page data.
      This patch also fixes a boundary condition that was incorrect.
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      8b421601