1. 09 Dec, 2010 5 commits
    • NeilBrown's avatar
      md: protect against NULL reference when waiting to start a raid10. · 589a594b
      NeilBrown authored
      When we fail to start a raid10 for some reason, we call
      md_unregister_thread to kill the thread that was created.
      
      Unfortunately md_thread() will then make one call into the handler
      (raid10d) even though md_wakeup_thread has not been called.  This is
      not safe and as md_unregister_thread is called after mddev->private
      has been set to NULL, it will definitely cause a NULL dereference.
      
      So fix this at both ends:
       - md_thread should only call the handler if THREAD_WAKEUP has been
         set.
       - raid10 should call md_unregister_thread before setting things
         to NULL just like all the other raid modules do.
      
      This is applicable to 2.6.35 and later.
      
      Cc: stable@kernel.org
      Reported-by: default avatar"Citizen" <citizen_lee@thecus.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      589a594b
    • NeilBrown's avatar
      md: fix bug with re-adding of partially recovered device. · 1a855a06
      NeilBrown authored
      With v0.90 metadata, a hot-spare does not become a full member of the
      array until recovery is complete.  So if we re-add such a device to
      the array, we know that all of it is as up-to-date as the event count
      would suggest, and so it a bitmap-based recovery is possible.
      
      However with v1.x metadata, the hot-spare immediately becomes a full
      member of the array, but it record how much of the device has been
      recovered.  If the array is stopped and re-assembled recovery starts
      from this point.
      
      When such a device is hot-added to an array we currently lose the 'how
      much is recovered' information and incorrectly included it as a full
      in-sync member (after bitmap-based fixup).
      This is wrong and unsafe and could corrupt data.
      
      So be more careful about setting saved_raid_disk - which is what
      guides the re-adding of devices back into an array.
      The new code matches the code in slot_store which does a similar
      thing, which is encouraging.
      
      This is suitable for any -stable kernel.
      Reported-by: default avatar"Dailey, Nate" <Nate.Dailey@stratus.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      1a855a06
    • NeilBrown's avatar
      md: fix possible deadlock in handling flush requests. · a035fc3e
      NeilBrown authored
      As recorded in
          https://bugzilla.kernel.org/show_bug.cgi?id=24012
      
      it is possible for a flush request through md to hang.  This is due to
      an interaction between the recursion avoidance in
      generic_make_request, the insistence in md of only having one flush
      active at a time, and the possibility of dm (or md) submitting two
      flush requests to a device from the one generic_make_request.
      
      If a generic_make_request call into dm causes two flush requests to be
      queued (as happens if the dm table has two targets - they get one
      each), these two will be queued inside generic_make_request.
      
      Assume they are for the same md device.
      The first is processed and causes 1 or more flush requests to be sent
      to lower devices.  These get queued within generic_make_request too.
      Then the second flush to the md device gets handled and it blocks
      waiting for the first flush to complete.  But it won't complete until
      the two lower-device requests complete, and they haven't even been
      submitted yet as they are on the generic_make_request queue.
      
      The deadlock can be broken by using a separate thread to submit the
      requests to lower devices.  md has such a thread readily available:
      md_wq.
      
      So use it to submit these requests.
      Reported-by: default avatarGiacomo Catenazzi <cate@cateee.net>
      Tested-by: default avatarGiacomo Catenazzi <cate@cateee.net>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a035fc3e
    • NeilBrown's avatar
      md: move code in to submit_flushes. · a7a07e69
      NeilBrown authored
      submit_flushes is called from exactly one place.
      Move the code that is before and after that call into
      submit_flushes.
      
      This has not functional change, but will make the next patch
      smaller and easier to follow.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a7a07e69
    • NeilBrown's avatar
      md: remove handling of flush_pending in md_submit_flush_data · 2b74e12e
      NeilBrown authored
      None of the functions called between setting flush_pending to 1, and
      atomic_dec_and_test can change flush_pending, or will anything
      running in any other thread (as ->flush_bio is not NULL).  So the
      atomic_dec_and_test will always succeed.
      So remove the atomic_sec and the atomic_dec_and_test.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      2b74e12e
  2. 24 Nov, 2010 3 commits
    • Darrick J. Wong's avatar
      md: Call blk_queue_flush() to establish flush/fua support · be20e6c6
      Darrick J. Wong authored
      Before 2.6.37, the md layer had a mechanism for catching I/Os with the
      barrier flag set, and translating the barrier into barriers for all
      the underlying devices.  With 2.6.37, I/O barriers have become plain
      old flushes, and the md code was updated to reflect this.  However,
      one piece was left out -- the md layer does not tell the block layer
      that it supports flushes or FUA access at all, which results in md
      silently dropping flush requests.
      
      Since the support already seems there, just add this one piece of
      bookkeeping.
      Signed-off-by: default avatarDarrick J. Wong <djwong@us.ibm.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      be20e6c6
    • NeilBrown's avatar
      md/raid1: really fix recovery looping when single good device fails. · 8f9e0ee3
      NeilBrown authored
      Commit 4044ba58 supposedly fixed a
      problem where if a raid1 with just one good device gets a read-error
      during recovery, the recovery would abort and immediately restart in
      an infinite loop.
      
      However it depended on raid1_remove_disk removing the spare device
      from the array.  But that does not happen in this case.  So add a test
      so that in the 'recovery_disabled' case, the device will be removed.
      
      This suitable for any kernel since 2.6.29 which is when
      recovery_disabled was introduced.
      
      Cc: stable@kernel.org
      Reported-by: default avatarSebastian Färber <faerber@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      8f9e0ee3
    • Justin Maggard's avatar
      md: fix return value of rdev_size_change() · c26a44ed
      Justin Maggard authored
      When trying to grow an array by enlarging component devices,
      rdev_size_store() expects the return value of rdev_size_change() to be
      in sectors, but the actual value is returned in KBs.
      
      This functionality was broken by commit
           dd8ac336
      so this patch is suitable for any kernel since 2.6.30.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarJustin Maggard <jmaggard10@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c26a44ed
  3. 21 Nov, 2010 1 commit
  4. 20 Nov, 2010 3 commits
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · b86db474
      Linus Torvalds authored
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard
        fs: Do not dispatch FITRIM through separate super_operation
        ext4: ext4_fill_super shouldn't return 0 on corruption
        jbd2: fix /proc/fs/jbd2/<dev> when using an external journal
        ext4: missing unlock in ext4_clear_request_list()
        ext4: fix setting random pages PageUptodate
      b86db474
    • Lukas Czerner's avatar
      ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard · e681c047
      Lukas Czerner authored
      Filesystem independent ioctl was rejected as not common enough to be in
      core vfs ioctl. Since we still need to access to this functionality this
      commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch
      ext4_trim_fs().
      
      It takes fstrim_range structure as an argument. fstrim_range is definec in
      the include/linux/fs.h and its definition is as follows.
      
      struct fstrim_range {
      	__u64 start;
      	__u64 len;
      	__u64 minlen;
      }
      
      start	- first Byte to trim
      len	- number of Bytes to trim from start
      minlen	- minimum extent length to trim, free extents shorter than this
        number of Bytes will be ignored. This will be rounded up to fs
        block size.
      
      After the FITRIM is done, the number of actually discarded Bytes is stored
      in fstrim_range.len to give the user better insight on how much storage
      space has been really released for wear-leveling.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e681c047
    • Lukas Czerner's avatar
      fs: Do not dispatch FITRIM through separate super_operation · 93bb41f4
      Lukas Czerner authored
      There was concern that FITRIM ioctl is not common enough to be included
      in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
      in dispatching this out to a separate vector instead of just through
      ->ioctl.
      
      So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
      from super_operation structure.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      93bb41f4
  5. 19 Nov, 2010 15 commits
  6. 18 Nov, 2010 13 commits