1. 24 Feb, 2011 2 commits
    • NeilBrown's avatar
      md: Fix - again - partition detection when array becomes active · f0b4f7e2
      NeilBrown authored
      Revert
          b821eaa5
      and
          f3b99be1
      
      When I wrote the first of these I had a wrong idea about the
      lifetime of 'struct block_device'.  It can disappear at any time that
      the block device is not open if it falls out of the inode cache.
      
      So relying on the 'size' recorded with it to detect when the
      device size has changed and so we need to revalidate, is wrong.
      
      Rather, we really do need the 'changed' attribute stored directly in
      the mddev and set/tested as appropriate.
      
      Without this patch, a sequence of:
         mknod / open / close / unlink
      
      (which can cause a block_device to be created and then destroyed)
      will result in a rescan of the partition table and consequence removal
      and addition of partitions.
      Several of these in a row can get udev racing to create and unlink and
      other code can get confused.
      
      With the patch, the rescan is only performed when needed and so there
      are no races.
      
      This is suitable for any stable kernel from 2.6.35.
      Reported-by: default avatar"Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: stable@kernel.org
      f0b4f7e2
    • NeilBrown's avatar
      Fix over-zealous flush_disk when changing device size. · 93b270f7
      NeilBrown authored
      There are two cases when we call flush_disk.
      In one, the device has disappeared (check_disk_change) so any
      data will hold becomes irrelevant.
      In the oter, the device has changed size (check_disk_size_change)
      so data we hold may be irrelevant.
      
      In both cases it makes sense to discard any 'clean' buffers,
      so they will be read back from the device if needed.
      
      In the former case it makes sense to discard 'dirty' buffers
      as there will never be anywhere safe to write the data.  In the
      second case it *does*not* make sense to discard dirty buffers
      as that will lead to file system corruption when you simply enlarge
      the containing devices.
      
      flush_disk calls __invalidate_devices.
      __invalidate_device calls both invalidate_inodes and invalidate_bdev.
      
      invalidate_inodes *does* discard I_DIRTY inodes and this does lead
      to fs corruption.
      
      invalidate_bev *does*not* discard dirty pages, but I don't really care
      about that at present.
      
      So this patch adds a flag to __invalidate_device (calling it
      __invalidate_device2) to indicate whether dirty buffers should be
      killed, and this is passed to invalidate_inodes which can choose to
      skip dirty inodes.
      
      flusk_disk then passes true from check_disk_change and false from
      check_disk_size_change.
      
      dm avoids tripping over this problem by calling i_size_write directly
      rathher than using check_disk_size_change.
      
      md does use check_disk_size_change and so is affected.
      
      This regression was introduced by commit 608aeef1 which causes
      check_disk_size_change to call flush_disk, so it is suitable for any
      kernel since 2.6.27.
      
      Cc: stable@kernel.org
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Andrew Patterson <andrew.patterson@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      93b270f7
  2. 21 Feb, 2011 1 commit
    • NeilBrown's avatar
      md: avoid spinlock problem in blk_throtl_exit · da9cf505
      NeilBrown authored
      blk_throtl_exit assumes that ->queue_lock still exists,
      so make sure that it does.
      To do this, we stop redirecting ->queue_lock to conf->device_lock
      and leave it pointing where it is initialised - __queue_lock.
      
      As the blk_plug functions check the ->queue_lock is held, we now
      take that spin_lock explicitly around the plug functions.  We don't
      need the locking, just the warning removal.
      
      This is needed for any kernel with the blk_throtl code, which is
      which is 2.6.37 and later.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      da9cf505
  3. 16 Feb, 2011 2 commits
    • NeilBrown's avatar
      md: correctly handle probe of an 'mdp' device. · 8f5f02c4
      NeilBrown authored
      'mdp' devices are md devices with preallocated device numbers
      for partitions. As such it is possible to mknod and open a partition
      before opening the whole device.
      
      this causes  md_probe() to be called with a device number of a
      partition, which in-turn calls mddev_find with such a number.
      
      However mddev_find expects the number of a 'whole device' and
      does the wrong thing with partition numbers.
      
      So add code to mddev_find to remove the 'partition' part of
      a device number and just work with the 'whole device'.
      
      This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
      
      Reported-by: hkmaly@bigfoot.com
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: <stable@kernel.org>
      8f5f02c4
    • NeilBrown's avatar
      md: don't set_capacity before array is active. · cbe6ef1d
      NeilBrown authored
      If the desired size of an array is set (via sysfs) before the array is
      active (which is the normal sequence), we currrently call set_capacity
      immediately.
      This means that a subsequent 'open' (as can be caused by some
      udev-triggers program) will notice the new size and try to probe for
      partitions.  However as the array isn't quite ready yet the read will
      fail.  Then when the array is read, as the size doesn't change again
      we don't try to re-probe.
      
      So when setting array size via sysfs, only call set_capacity if the
      array is already active.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      cbe6ef1d
  4. 13 Feb, 2011 1 commit
  5. 08 Feb, 2011 1 commit
    • Krzysztof Wojcik's avatar
      FIX: md: process hangs at wait_barrier after 0->10 takeover · 02214dc5
      Krzysztof Wojcik authored
      Following symptoms were observed:
      1. After raid0->raid10 takeover operation we have array with 2
      missing disks.
      When we add disk for rebuild, recovery process starts as expected
      but it does not finish- it stops at about 90%, md126_resync process
      hangs in "D" state.
      2. Similar behavior is when we have mounted raid0 array and we
      execute takeover to raid10. After this when we try to unmount array-
      it causes process umount hangs in "D"
      
      In scenarios above processes hang at the same function- wait_barrier
      in raid10.c.
      Process waits in macro "wait_event_lock_irq" until the
      "!conf->barrier" condition will be true.
      In scenarios above it never happens.
      
      Reason was that at the end of level_store, after calling pers->run,
      we call mddev_resume. This calls pers->quiesce(mddev, 0) with
      RAID10, that calls lower_barrier.
      However raise_barrier hadn't been called on that 'conf' yet,
      so conf->barrier becomes negative, which is bad.
      
      This patch introduces setting conf->barrier=1 after takeover
      operation. It prevents to become barrier negative after call
      lower_barrier().
      Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      02214dc5
  6. 07 Feb, 2011 1 commit
    • Chris Mason's avatar
      md_make_request: don't touch the bio after calling make_request · e91ece55
      Chris Mason authored
      md_make_request was calling bio_sectors() for part_stat_add
      after it was calling the make_request function.  This is
      bad because the make_request function can free the bio and
      because the bi_size field can change around.
      
      The fix here was suggested by Jens Axboe.  It saves the
      sector count before the make_request call.  I hit this
      with CONFIG_DEBUG_PAGEALLOC turned on while trying to break
      his pretty fusionio card.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e91ece55
  7. 02 Feb, 2011 1 commit
  8. 31 Jan, 2011 8 commits
    • NeilBrown's avatar
      md: don't clear curr_resync_completed at end of resync. · 7281f812
      NeilBrown authored
      There is no need to set this to zero at this point.  It will be
      set to zero by remove_and_add_spares or at the start of
      md_do_sync at the latest.
      And setting it to zero before MD_RECOVERY_RUNNING is cleared can
      make a 'zero' appear briefly in the 'sync_completed' sysfs attribute
      just as resync is finishing.
      
      So simply remove this setting to zero.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7281f812
    • NeilBrown's avatar
      md: Don't use remove_and_add_spares to remove failed devices from a read-only array · a8c42c7f
      NeilBrown authored
      remove_and_add_spares is called in two places where the needs really
      are very different.
      remove_and_add_spares should not be called on an array which is about
      to be reshaped as some extra devices might have been manually added
      and that would remove them.  However if the array is 'read-auto',
      that will currently happen, which is bad.
      
      So in the 'ro != 0' case don't call remove_and_add_spares but simply
      remove the failed devices as the comment suggests is needed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a8c42c7f
    • Krzysztof Wojcik's avatar
      Add raid1->raid0 takeover support · fc3a08b8
      Krzysztof Wojcik authored
      This patch introduces raid 1 to raid0 takeover operation
      in kernel space.
      Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: default avatarNeil Brown <neilb@nbeee.brown>
      fc3a08b8
    • NeilBrown's avatar
      md: Remove the AllReserved flag for component devices. · f21e9ff7
      NeilBrown authored
      This flag is not needed and is used badly.
      
      Devices that are included in a native-metadata array are reserved
      exclusively for that array - and currently have AllReserved set.
      They all are bd_claimed for the rdev and so cannot be shared.
      
      Devices that are included in external-metadata arrays can be shared
      among multiple arrays - providing there is no overlap.
      These are bd_claimed for md in general - not for a particular rdev.
      
      When changing the amount of a device that is used in an array we need
      to check for overlap.  This currently includes a check on AllReserved
      So even without overlap, sharing with an AllReserved device is not
      allowed.
      However the bd_claim usage already precludes sharing with these
      devices, so the test on AllReserved is not needed.  And in fact it is
      wrong.
      
      As this is the only use of AllReserved, simply remove all usage and
      definition of AllReserved.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f21e9ff7
    • NeilBrown's avatar
      md: don't abort checking spares as soon as one cannot be added. · 50da0840
      NeilBrown authored
      As spares can be added manually before a reshape starts, we need to
      find them all to mark some of them as in_sync.
      
      Previously we would abort looking for spares when we found an
      unallocated spare what could not be added to the array (implying there
      was no room for new spares).  However already-added spares could be
      later in the list, so we need to keep searching.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      50da0840
    • NeilBrown's avatar
      md: fix the test for finding spares in raid5_start_reshape. · 469518a3
      NeilBrown authored
      As spares can be added to the array before the reshape is started,
      we need to find and count them when checking there are enough.
      The array could have been degraded, so we need to check all devices,
      no just those out side of the range of devices in the array before
      the reshape.
      
      So instead of checking the index, check the In_sync flag as that
      reliably tells if the device is a spare or this purpose.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      469518a3
    • NeilBrown's avatar
      md: simplify some 'if' conditionals in raid5_start_reshape. · 87a8dec9
      NeilBrown authored
      There are two consecutive 'if' statements.
      
       if (mddev->delta_disks >= 0)
            ....
       if (mddev->delta_disks > 0)
      
      The code in the second is equally valid if delta_disks == 0, and these
      two statements are the only place that 'added_devices' is used.
      
      So make them a single if statement, make added_devices a local
      variable, and re-indent it all.
      
      No functional change.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      87a8dec9
    • NeilBrown's avatar
      md: revert change to raid_disks on failure. · de171cb9
      NeilBrown authored
      If we try to update_raid_disks and it fails, we should put
      'delta_disks' back to zero.  This is important because some code,
      such as slot_store, assumes that delta_disks has been validated.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      de171cb9
  9. 28 Jan, 2011 8 commits
  10. 27 Jan, 2011 14 commits
  11. 26 Jan, 2011 1 commit