An error occurred fetching the project authors.
  1. 11 Oct, 2011 3 commits
  2. 28 Jul, 2011 3 commits
  3. 26 Jul, 2011 4 commits
    • NeilBrown's avatar
      md/raid5: add some more fields to stripe_head_state · c5709ef6
      NeilBrown authored
      Adding these three fields will allow more common code to be moved
      to handle_stripe()
      
      struct field rearrangement by Namhyung Kim.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarNamhyung Kim <namhyung@gmail.com>
      c5709ef6
    • NeilBrown's avatar
      md/raid5: unify stripe_head_state and r6_state · f2b3b44d
      NeilBrown authored
      'struct stripe_head_state' stores state about the 'current' stripe
      that is passed around while handling the stripe.
      For RAID6 there is an extension structure: r6_state, which is also
      passed around.
      There is no value in keeping these separate, so move the fields from
      the latter into the former.
      
      This means that all code now needs to treat s->failed_num as an small
      array, but this is a small cost.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarNamhyung Kim <namhyung@gmail.com>
      f2b3b44d
    • NeilBrown's avatar
      md/raid5: replace sh->lock with an 'active' flag. · c4c1663b
      NeilBrown authored
      sh->lock is now mainly used to ensure that two threads aren't running
      in the locked part of handle_stripe[56] at the same time.
      
      That can more neatly be achieved with an 'active' flag which we set
      while running handle_stripe.  If we find the flag is set, we simply
      requeue the stripe for later by setting STRIPE_HANDLE.
      
      For safety we take ->device_lock while examining the state of the
      stripe and creating a summary in 'stripe_head_state / r6_state'.
      This possibly isn't needed but as shared fields like ->toread,
      ->towrite are checked it is safer for now at least.
      
      We leave the label after the old 'unlock' called "unlock" because it
      will disappear in a few patches, so renaming seems pointless.
      
      This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE
      later, but that is not a problem.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarNamhyung Kim <namhyung@gmail.com>
      c4c1663b
    • NeilBrown's avatar
      md/raid5: Remove use of sh->lock in sync_request · 83206d66
      NeilBrown authored
      This is the start of a series of patches to remove sh->lock.
      
      sync_request takes sh->lock before setting STRIPE_SYNCING to ensure
      there is no race with testing it in handle_stripe[56].
      
      Instead, use a new flag STRIPE_SYNC_REQUESTED and test it early
      in handle_stripe[56] (after getting the same lock) and perform the
      same set/clear operations if it was set.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarNamhyung Kim <namhyung@gmail.com>
      83206d66
  4. 18 Apr, 2011 1 commit
    • NeilBrown's avatar
      md - remove old plugging code. · 482c0834
      NeilBrown authored
      md has some plugging infrastructure for RAID5 to use because the
      normal plugging infrastructure required a 'request_queue', and when
      called from dm, RAID5 doesn't have one of those available.
      
      This relied on the ->unplug_fn callback which doesn't exist any more.
      
      So remove all of that code, both in md and raid5.  Subsequent patches
      with restore the plugging functionality.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      482c0834
  5. 10 Mar, 2011 1 commit
  6. 10 Sep, 2010 1 commit
    • Tejun Heo's avatar
      md: implment REQ_FLUSH/FUA support · e9c7469b
      Tejun Heo authored
      This patch converts md to support REQ_FLUSH/FUA instead of now
      deprecated REQ_HARDBARRIER.  In the core part (md.c), the following
      changes are notable.
      
      * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
        processing of other requests and thus there is no reason to mark the
        queue congested while FLUSH/FUA is in progress.
      
      * REQ_FLUSH/FUA failures are final and its users don't need retry
        logic.  Retry logic is removed.
      
      * Preflush needs to be issued to all member devices but FUA writes can
        be handled the same way as other writes - their processing can be
        deferred to request_queue of member devices.  md_barrier_request()
        is renamed to md_flush_request() and simplified accordingly.
      
      For linear, raid0 and multipath, the core changes are enough.  raid1,
      5 and 10 need the following conversions.
      
      * raid1: Handling of FLUSH/FUA bio's can simply be deferred to
        request_queues of member devices.  Barrier related logic removed.
      
      * raid5: Queue draining logic dropped.  FUA bit is propagated through
        biodrain and stripe resconstruction such that all the updated parts
        of the stripe are written out with FUA writes if any of the dirtying
        writes was FUA.  preread_active_stripes handling in make_request()
        is updated as suggested by Neil Brown.
      
      * raid10: FUA bit needs to be propagated to write clones.
      
      linear, raid0, 1, 5 and 10 tested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      e9c7469b
  7. 26 Jul, 2010 4 commits
  8. 21 Jul, 2010 1 commit
  9. 17 Feb, 2010 1 commit
    • Tejun Heo's avatar
      percpu: add __percpu sparse annotations to what's left · a29d8b8e
      Tejun Heo authored
      Add __percpu sparse annotations to places which didn't make it in one
      of the previous patches.  All converions are trivial.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      a29d8b8e
  10. 16 Oct, 2009 2 commits
    • NeilBrown's avatar
      md: fix problems with RAID6 calculations for DDF. · e4424fee
      NeilBrown authored
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e4424fee
    • Dan Williams's avatar
      md/raid456: downlevel multicore operations to raid_run_ops · 417b8d4a
      Dan Williams authored
      The percpu conversion allowed a straightforward handoff of stripe
      processing to the async subsytem that initially showed some modest gains
      (+4%).  However, this model is too simplistic and leads to stripes
      bouncing between raid5d and the async thread pool for every invocation
      of handle_stripe().  As reported by Holger this can fall into a
      pathological situation severely impacting throughput (6x performance
      loss).
      
      By downleveling the parallelism to raid_run_ops the pathological
      stripe_head bouncing is eliminated.  This version still exhibits an
      average 11% throughput loss for:
      
      	mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
      	echo 1024 > /sys/block/md0/md/stripe_cache_size
      	dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
      
      ...but the results are at least stable and can be used as a base for
      further multicore experimentation.
      Reported-by: default avatarHolger Kiehl <Holger.Kiehl@dwd.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      417b8d4a
  11. 30 Aug, 2009 4 commits
    • Dan Williams's avatar
      md/raid6: asynchronous raid6 operations · ac6b53b6
      Dan Williams authored
      [ Based on an original patch by Yuri Tikhonov ]
      
      The raid_run_ops routine uses the asynchronous offload api and
      the stripe_operations member of a stripe_head to carry out xor+pq+copy
      operations asynchronously, outside the lock.
      
      The operations performed by RAID-6 are the same as in the RAID-5 case
      except for no support of STRIPE_OP_PREXOR operations. All the others
      are supported:
      STRIPE_OP_BIOFILL
       - copy data into request buffers to satisfy a read request
      STRIPE_OP_COMPUTE_BLK
       - generate missing blocks (1 or 2) in the cache from the other blocks
      STRIPE_OP_BIODRAIN
       - copy data out of request buffers to satisfy a write request
      STRIPE_OP_RECONSTRUCT
       - recalculate parity for new data that has entered the cache
      STRIPE_OP_CHECK
       - verify that the parity is correct
      
      The flow is the same as in the RAID-5 case, and reuses some routines, namely:
      1/ ops_complete_postxor (renamed to ops_complete_reconstruct)
      2/ ops_complete_compute (updated to set up to 2 targets uptodate)
      3/ ops_run_check (renamed to ops_run_check_p for xor parity checks)
      
      [neilb@suse.de: fixes to get it to pass mdadm regression suite]
      Reviewed-by: default avatarAndre Noll <maan@systemlinux.org>
      Signed-off-by: default avatarYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: default avatarIlya Yanok <yanok@emcraft.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      
      
      
      ac6b53b6
    • Dan Williams's avatar
      async_tx: add sum check flags · ad283ea4
      Dan Williams authored
      Replace the flat zero_sum_result with a collection of flags to contain
      the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed
      solomon syndrome) zero-sum result.  Use the SUM_CHECK_ namespace instead
      of DMA_ since these flags will be used on non-dma-zero-sum enabled
      platforms.
      Reviewed-by: default avatarAndre Noll <maan@systemlinux.org>
      Acked-by: default avatarMaciej Sosnowski <maciej.sosnowski@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ad283ea4
    • Dan Williams's avatar
      md/raid5,6: add percpu scribble region for buffer lists · d6f38f31
      Dan Williams authored
      Use percpu memory rather than stack for storing the buffer lists used in
      parity calculations.  Include space for dma address conversions and pass
      that to async_tx via the async_submit_ctl.scribble pointer.
      
      [ Impact: move memory pressure from stack to heap ]
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      
      
      d6f38f31
    • Dan Williams's avatar
      md/raid6: move the spare page to a percpu allocation · 36d1c647
      Dan Williams authored
      In preparation for asynchronous handling of raid6 operations move the
      spare page to a percpu allocation to allow multiple simultaneous
      synchronous raid6 recovery operations.
      
      Make this allocation cpu hotplug aware to maximize allocation
      efficiency.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      
      
      36d1c647
  12. 17 Jun, 2009 1 commit
  13. 16 Jun, 2009 1 commit
    • NeilBrown's avatar
      md: remove mddev_to_conf "helper" macro · 070ec55d
      NeilBrown authored
      Having a macro just to cast a void* isn't really helpful.
      I would must rather see that we are simply de-referencing ->private,
      than have to know what the macro does.
      
      So open code the macro everywhere and remove the pointless cast.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      070ec55d
  14. 31 Mar, 2009 13 commits
    • NeilBrown's avatar
      md/raid5 revise rules for when to update metadata during reshape · c8f517c4
      NeilBrown authored
      We currently update the metadata :
       1/ every 3Megabytes
       2/ When the place we will write new-layout data to is recorded in
          the metadata as still containing old-layout data.
      
      Rule one exists to avoid having to re-do too much reshaping in the
      face of a crash/restart.  So it should really be time based rather
      than size based.  So change it to "every 10 seconds".
      
      Rule two turns out to be too harsh when restriping an array
      'in-place', as in that case the metadata much be updates for every
      stripe.
      For the in-place update, it can only possibly be safe from a crash if
      some user-space program data a backup of every e.g. few hundred
      stripes before allowing them to be reshaped.  In that case, the
      constant metadata update is pointless.
      So only update the metadata if the new metadata will report that the
      end of the 'old-layout' data is beyond where we are currently
      writing 'new-layout' data.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c8f517c4
    • NeilBrown's avatar
      md/raid5: prepare for allowing reshape to change layout · e183eaed
      NeilBrown authored
      Add prev_algo to raid5_conf_t along the same lines as prev_chunk
      and previous_raid_disks.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e183eaed
    • NeilBrown's avatar
      md/raid5: prepare for allowing reshape to change chunksize. · 784052ec
      NeilBrown authored
      Add "prev_chunk" to raid5_conf_t, similar to "previous_raid_disks", to
      remember what the chunk size was before the reshape that is currently
      underway.
      
      This seems like duplication with "chunk_size" and "new_chunk" in
      mddev_t, and to some extent it is, but there are differences.
      The values in mddev_t are always defined and often the same.
      The prev* values are only defined if a reshape is underway.
      
      Also (and more significantly) the raid5_conf_t values will be changed
      at the same time (inside an appropriate lock) that the reshape is
      started by setting reshape_position.  In contrast, the new_chunk value
      is set when the sysfs file is written which could be well before the
      reshape starts.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      784052ec
    • NeilBrown's avatar
      md/raid5: clearly differentiate 'before' and 'after' stripes during reshape. · 86b42c71
      NeilBrown authored
      During a raid5 reshape, we have some stripes in the cache that are
      'before' the reshape (and are still to be processed) and some that are
      'after'.  They are currently differentiated by having different
      ->disks values as the only reshape current supported involves changing
      the number of disks.
      
      However we will soon support reshapes that do not change the number
      of disks (chunk parity or chunk size).  So make the difference more
      explicit with a 'generation' number.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      86b42c71
    • NeilBrown's avatar
      md/raid5: change reshape-progress measurement to cope with reshaping backwards. · fef9c61f
      NeilBrown authored
      When reducing the number of devices in a raid4/5/6, the reshape
      process has to start at the end of the array and work down to the
      beginning.  So we need to handle expand_progress and expand_lo
      differently.
      
      This patch renames "expand_progress" and "expand_lo" to avoid the
      implication that anything is getting bigger (expand->reshape) and
      every place they are used, we make sure that they are used the right
      way depending on whether delta_disks is positive or negative.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      fef9c61f
    • NeilBrown's avatar
      md/raid5: drop qd_idx from r6_state · 34e04e87
      NeilBrown authored
      We now have this value in stripe_head so we don't need to duplicate
      it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      34e04e87
    • Dan Williams's avatar
      md/raid6: move raid6 data processing to raid6_pq.ko · f701d589
      Dan Williams authored
      Move the raid6 data processing routines into a standalone module
      (raid6_pq) to prepare them to be called from async_tx wrappers and other
      non-md drivers/modules.  This precludes a circular dependency of raid456
      needing the async modules for data processing while those modules in
      turn depend on raid456 for the base level synchronous raid6 routines.
      
      To support this move:
      1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
      2/ The raid6_call, recovery calls, and table symbols are exported
      3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
         compile
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f701d589
    • NeilBrown's avatar
      md/raid5: refactor raid5 "run" · 91adb564
      NeilBrown authored
      .. so that the code to create the private data structures is separate.
      This will help with future code to change the level of an active
      array.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      91adb564
    • NeilBrown's avatar
      md/raid5: finish support for DDF/raid6 · 67cc2b81
      NeilBrown authored
      DDF requires RAID6 calculations over different devices in a different
      order.
      For md/raid6, we calculate over just the data devices, starting
      immediately after the 'Q' block.
      For ddf/raid6 we calculate over all devices, using zeros in place of
      the P and Q blocks.
      
      This requires unfortunately complex loops...
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      67cc2b81
    • NeilBrown's avatar
      md/raid5: Add support for new layouts for raid5 and raid6. · 99c0fb5f
      NeilBrown authored
      DDF uses different layouts for P and Q blocks than current md/raid6
      so add those that are missing.
      Also add support for RAID6 layouts that are identical to various
      raid5 layouts with the simple addition of one device to hold all of
      the 'Q' blocks.
      Finally add 'raid5' layouts to match raid4.
      These last to will allow online level conversion.
      
      Note that this does not provide correct support for DDF/raid6 yet
      as the order in which data blocks are summed to produce the Q block
      is significant and different between current md code and DDF
      requirements.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      99c0fb5f
    • NeilBrown's avatar
      md/raid6: remove expectation that Q device is immediately after P device. · d0dabf7e
      NeilBrown authored
      Code currently assumes that the devices in a raid6 stripe are
        0 1 ... N-1 P Q
      in some rotated order.  We will shortly add new layouts in which
      this strict pattern is broken.
      So remove this expectation.  We still assume that the data disks
      are roughly in-order.  However P and Q can be inserted anywhere within
      that order.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d0dabf7e
    • NeilBrown's avatar
      md: move lots of #include lines out of .h files and into .c · bff61975
      NeilBrown authored
      This makes the includes more explicit, and is preparation for moving
      md_k.h to drivers/md/md.h
      
      Remove include/raid/md.h as its only remaining use was to #include
      other files.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      bff61975
    • Christoph Hellwig's avatar
      md: move headers out of include/linux/raid/ · ef740c37
      Christoph Hellwig authored
      Move the headers with the local structures for the disciplines and
      bitmap.h into drivers/md/ so that they are more easily grepable for
      hacking and not far away.  md.h is left where it is for now as there
      are some uses from the outside.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ef740c37