1. 25 Feb, 2009 3 commits
    • NeilBrown's avatar
      md: avoid races when stopping resync. · 73d5c38a
      NeilBrown authored
      There has been a race in raid10 and raid1 for a long time
      which has only recently started showing up due to a scheduler changed.
      
      When a sync_read request finishes, as soon as reschedule_retry
      is called, another thread can mark the resync request as having
      completed, so md_do_sync can finish, ->stop can be called, and
      ->conf can be freed.  So using conf after reschedule_retry is not
      safe.
      
      Similarly, when finishing a sync_write, calling md_done_sync must be
      the last thing we do, as it allows a chain of events which will free
      conf and other data structures.
      
      The first of these requires action in raid10.c
      The second requires action in raid1.c and raid10.c
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      73d5c38a
    • NeilBrown's avatar
      md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery. · 78200d45
      NeilBrown authored
      For raid1/4/5/6, resync (fixing inconsistencies between devices) is
      very similar to recovery (rebuilding a failed device onto a spare).
      The both walk through the device addresses in order.
      
      For raid10 it can be quite different.  resync follows the 'array'
      address, and makes sure all copies are the same.  Recover walks
      through 'device' addresses and recreates each missing block.
      
      The 'bitmap_cond_end_sync' function allows the write-intent-bitmap
      (When present) to be updated to reflect a partially completed resync.
      It makes assumptions which mean that it does not work correctly for
      raid10 recovery at all.
      
      In particularly, it can cause bitmap-directed recovery of a raid10 to
      not recovery some of the blocks that need to be recovered.
      
      So move the call to bitmap_cond_end_sync into the resync path, rather
      than being in the common "resync or recovery" path.
      
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      78200d45
    • NeilBrown's avatar
      md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery. · 09b4068a
      NeilBrown authored
      When doing recovery on a raid10 with a write-intent bitmap, we only
      need to recovery chunks that are flagged in the bitmap.
      
      However if we choose to skip a chunk as it isn't flag, the code
      currently skips the whole raid10-chunk, thus it might not recovery
      some blocks that need recovering.
      
      This patch fixes it.
      
      In case that is confusing, it might help to understand that there
      is a 'raid10 chunk size' which guides how data is distributed across
      the devices, and a 'bitmap chunk size' which says how much data
      corresponds to a single bit in the bitmap.
      
      This bug only affects cases where the bitmap chunk size is smaller
      than the raid10 chunk size.
      
      
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      09b4068a
  2. 21 Feb, 2009 17 commits
  3. 20 Feb, 2009 20 commits