1. 19 Mar, 2012 9 commits
    • NeilBrown's avatar
      md/raid10 - support resizing some RAID10 arrays. · 006a09a0
      NeilBrown authored
      'resizing' an array in this context means making use of extra
      space that has become available in component devices, not adding new
      devices.
      It also includes shrinking the array to take up less space of
      component devices.
      
      This is not supported for array with a 'far' layout.  However
      for 'near' and 'offset' layout arrays, adding and removing space at
      the end of the devices is easy to support, and this patch provides
      that support.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      006a09a0
    • NeilBrown's avatar
      md/raid1: handle merge_bvec_fn in member devices. · 6b740b8d
      NeilBrown authored
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So create a raid1 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6b740b8d
    • NeilBrown's avatar
      md/raid10: handle merge_bvec_fn in member devices. · 050b6615
      NeilBrown authored
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So enhance the raid10 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      050b6615
    • NeilBrown's avatar
      md: add proper merge_bvec handling to RAID0 and Linear. · ba13da47
      NeilBrown authored
      These personalities currently set a max request size of one page
      when any member device has a merge_bvec_fn because they don't
      bother to call that function.
      
      This causes extra works in splitting and combining requests.
      
      So make the extra effort to call the merge_bvec_fn when it exists
      so that we end up with larger requests out the bottom.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ba13da47
    • NeilBrown's avatar
      md: tidy up rdev_for_each usage. · dafb20fa
      NeilBrown authored
      md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
      mddev.  However it uses the 'safe' version of list_for_each_entry,
      and so requires the extra variable, but doesn't include 'safe' in the
      name, which is useful documentation.
      
      Consequently some places use this safe version without needing it, and
      many use an explicity list_for_each entry.
      
      So:
       - rename rdev_for_each to rdev_for_each_safe
       - create a new rdev_for_each which uses the plain
         list_for_each_entry,
       - use the 'safe' version only where needed, and convert all other
         list_for_each_entry calls to use rdev_for_each.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      dafb20fa
    • NeilBrown's avatar
      md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb
      NeilBrown authored
      If RAID1 or RAID10 is used under LVM or some other stacking
      block device, it is possible to enter a deadlock during
      resync or recovery.
      This can happen if the upper level block device creates
      two requests to the RAID1 or RAID10.  The first request gets
      processed, blocks recovery and queue requests for underlying
      requests in current->bio_list.  A resync request then starts
      which will wait for those requests and block new IO.
      
      But then the second request to the RAID1/10 will be attempted
      and it cannot progress until the resync request completes,
      which cannot progress until the underlying device requests complete,
      which are on a queue behind that second request.
      
      So allow that second request to proceed even though there is
      a resync request about to start.
      
      This is suitable for any -stable kernel.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRay Morris <support@bettercgi.com>
      Tested-by: default avatarRay Morris <support@bettercgi.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d6b42dcb
    • NeilBrown's avatar
      md/bitmap: ensure to load bitmap when creating via sysfs. · 4474ca42
      NeilBrown authored
      When commit 69e51b44 (md/bitmap:  separate out loading a bitmap...)
      created bitmap_load, it missed calling it after bitmap_create when a
      bitmap is created through the sysfs interface.
      So if a bitmap is added this way, we don't allocate memory properly
      and can crash.
      
      This is suitable for any -stable release since 2.6.35.
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      4474ca42
    • NeilBrown's avatar
      md: don't set md arrays to readonly on shutdown. · c744a65c
      NeilBrown authored
      It seems that with recent kernel, writeback can still be happening
      while shutdown is happening, and consequently data can be written
      after the md reboot notifier switches all arrays to read-only.
      This causes a BUG.
      
      So don't switch them to read-only - just mark them clean and
      set 'safemode' to '2' which mean that immediately after any
      write the array will be switch back to 'clean'.
      
      This could result in the shutdown happening when array is marked
      dirty, thus forcing a resync on reboot.  However if you reboot
      without performing a "sync" first, you get to keep both halves.
      
      This is suitable for any stable kernel (though there might be some
      conflicts with obvious fixes in earlier kernels).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c744a65c
    • NeilBrown's avatar
      md: allow re-add to failed arrays. · dc10c643
      NeilBrown authored
      When an array is failed (some data inaccessible) then there is no
      point attempting to add a spare as it could not possibly be recovered.
      
      However that may be value in re-adding a recently removed device.
      e.g. if there is a write-intent-bitmap and it is clear, then access
      to the data could be restored by this action.
      
      So don't reject a re-add to a failed array for RAID10 and RAID5 (the
      only arrays  types that check for a failed array).
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      dc10c643
  2. 13 Mar, 2012 5 commits
  3. 10 Mar, 2012 4 commits
    • Linus Torvalds's avatar
      Linux 3.3-rc7 · fde7d904
      Linus Torvalds authored
      fde7d904
    • Al Viro's avatar
      aio: fix the "too late munmap()" race · c7b28555
      Al Viro authored
      Current code has put_ioctx() called asynchronously from aio_fput_routine();
      that's done *after* we have killed the request that used to pin ioctx,
      so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
      from progressing.  As the result, we can end up with async call of
      put_ioctx() being the last one and possibly happening during exit_mmap()
      or elf_core_dump(), neither of which expects stray munmap() being done
      to them...
      
      We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
      with that, but that's all we care about - neither io_destroy() nor
      exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
      does really_put_req(), so the ioctx teardown won't be done until then
      and we don't care about the contents of ioctx past that point.
      
      Since actual freeing of these suckers is RCU-delayed, we don't need to
      bump ioctx refcount when request goes into list for async removal.
      All we need is rcu_read_lock held just over the ->ctx_lock-protected
      area in aio_fput_routine().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7b28555
    • Al Viro's avatar
      aio: fix io_setup/io_destroy race · 86b62a2c
      Al Viro authored
      Have ioctx_alloc() return an extra reference, so that caller would drop it
      on success and not bother with re-grabbing it on failure exit.  The current
      code is obviously broken - io_destroy() from another thread that managed
      to guess the address io_setup() would've returned would free ioctx right
      under us; gets especially interesting if aio_context_t * we pass to
      io_setup() points to PROT_READ mapping, so put_user() fails and we end
      up doing io_destroy() on kioctx another thread has just got freed...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86b62a2c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 86e06008
      Linus Torvalds authored
      Pull btrfs updates from Chris Mason:
       "I have two additional and btrfs fixes in my for-linus branch.  One is
        a casting error that leads to memory corruption on i386 during scrub,
        and the other fixes a corner case in the backref walking code (also
        triggered by scrub)."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix casting error in scrub reada code
        btrfs: fix locking issues in find_parent_nodes()
      86e06008
  4. 09 Mar, 2012 14 commits
  5. 08 Mar, 2012 8 commits