An error occurred fetching the project authors.
- 11 Oct, 2011 3 commits
-
-
NeilBrown authored
Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 28 Jul, 2011 3 commits
-
-
NeilBrown authored
On a successful write to a known bad block, flag the sh so that raid5d can remove the known bad block from the list. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
When a write error is detected, don't mark the device as failed immediately but rather record the fact for handle_stripe to deal with. Handle_stripe then attempts to record a bad block. Only if that fails does the device get marked as faulty. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
If we get an uncorrectable read error - record a bad block rather than failing the device. And if these errors (which may be due to known bad blocks) cause recovery to be impossible, record a bad block on the recovering devices, or abort the recovery. As we might abort a recovery without failing a device we need to teach RAID5 about recovery_disabled handling. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 26 Jul, 2011 4 commits
-
-
NeilBrown authored
Adding these three fields will allow more common code to be moved to handle_stripe() struct field rearrangement by Namhyung Kim. Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Namhyung Kim <namhyung@gmail.com>
-
NeilBrown authored
'struct stripe_head_state' stores state about the 'current' stripe that is passed around while handling the stripe. For RAID6 there is an extension structure: r6_state, which is also passed around. There is no value in keeping these separate, so move the fields from the latter into the former. This means that all code now needs to treat s->failed_num as an small array, but this is a small cost. Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Namhyung Kim <namhyung@gmail.com>
-
NeilBrown authored
sh->lock is now mainly used to ensure that two threads aren't running in the locked part of handle_stripe[56] at the same time. That can more neatly be achieved with an 'active' flag which we set while running handle_stripe. If we find the flag is set, we simply requeue the stripe for later by setting STRIPE_HANDLE. For safety we take ->device_lock while examining the state of the stripe and creating a summary in 'stripe_head_state / r6_state'. This possibly isn't needed but as shared fields like ->toread, ->towrite are checked it is safer for now at least. We leave the label after the old 'unlock' called "unlock" because it will disappear in a few patches, so renaming seems pointless. This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE later, but that is not a problem. Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Namhyung Kim <namhyung@gmail.com>
-
NeilBrown authored
This is the start of a series of patches to remove sh->lock. sync_request takes sh->lock before setting STRIPE_SYNCING to ensure there is no race with testing it in handle_stripe[56]. Instead, use a new flag STRIPE_SYNC_REQUESTED and test it early in handle_stripe[56] (after getting the same lock) and perform the same set/clear operations if it was set. Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Namhyung Kim <namhyung@gmail.com>
-
- 18 Apr, 2011 1 commit
-
-
NeilBrown authored
md has some plugging infrastructure for RAID5 to use because the normal plugging infrastructure required a 'request_queue', and when called from dm, RAID5 doesn't have one of those available. This relied on the ->unplug_fn callback which doesn't exist any more. So remove all of that code, both in md and raid5. Subsequent patches with restore the plugging functionality. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 10 Mar, 2011 1 commit
-
-
Jens Axboe authored
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 10 Sep, 2010 1 commit
-
-
Tejun Heo authored
This patch converts md to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER. In the core part (md.c), the following changes are notable. * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with processing of other requests and thus there is no reason to mark the queue congested while FLUSH/FUA is in progress. * REQ_FLUSH/FUA failures are final and its users don't need retry logic. Retry logic is removed. * Preflush needs to be issued to all member devices but FUA writes can be handled the same way as other writes - their processing can be deferred to request_queue of member devices. md_barrier_request() is renamed to md_flush_request() and simplified accordingly. For linear, raid0 and multipath, the core changes are enough. raid1, 5 and 10 need the following conversions. * raid1: Handling of FLUSH/FUA bio's can simply be deferred to request_queues of member devices. Barrier related logic removed. * raid5: Queue draining logic dropped. FUA bit is propagated through biodrain and stripe resconstruction such that all the updated parts of the stripe are written out with FUA writes if any of the dirtying writes was FUA. preread_active_stripes handling in make_request() is updated as suggested by Neil Brown. * raid10: FUA bit needs to be propagated to write clones. linear, raid0, 1, 5 and 10 tested. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 26 Jul, 2010 4 commits
-
-
NeilBrown authored
Also remove remaining accesses to ->queue and ->gendisk when ->queue is NULL (As it is in a DM target). Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
md/raid5 uses the plugging infrastructure provided by the block layer and 'struct request_queue'. However when we plug raid5 under dm there is no request queue so we cannot use that. So create a similar infrastructure that is much lighter weight and use it for raid5. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
the dm module will need this for dm-raid45. Also only access ->queue->backing_dev_info->congested_fn if ->queue actually exists. It won't in a dm target. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
We will shortly allow md devices with no gendisk (they are attached to a dm-target instead). That will cause mdname() to return 'mdX'. There is one place where mdname really needs to be unique: when creating the name for a slab cache. So in that case, if there is no gendisk, you the address of the mddev formatted in HEX to provide a unique name. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 21 Jul, 2010 1 commit
-
-
NeilBrown authored
Separate the actual 'change' code from the sysfs interface so that it can eventually be called internally. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 17 Feb, 2010 1 commit
-
-
Tejun Heo authored
Add __percpu sparse annotations to places which didn't make it in one of the previous patches. All converions are trivial. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Borislav Petkov <borislav.petkov@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Neil Brown <neilb@suse.de>
-
- 16 Oct, 2009 2 commits
-
-
NeilBrown authored
Signed-off-by:
NeilBrown <neilb@suse.de>
-
Dan Williams authored
The percpu conversion allowed a straightforward handoff of stripe processing to the async subsytem that initially showed some modest gains (+4%). However, this model is too simplistic and leads to stripes bouncing between raid5d and the async thread pool for every invocation of handle_stripe(). As reported by Holger this can fall into a pathological situation severely impacting throughput (6x performance loss). By downleveling the parallelism to raid_run_ops the pathological stripe_head bouncing is eliminated. This version still exhibits an average 11% throughput loss for: mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6 echo 1024 > /sys/block/md0/md/stripe_cache_size dd if=/dev/zero of=/dev/md0 bs=1024k count=2048 ...but the results are at least stable and can be used as a base for further multicore experimentation. Reported-by:
Holger Kiehl <Holger.Kiehl@dwd.de> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 30 Aug, 2009 4 commits
-
-
Dan Williams authored
[ Based on an original patch by Yuri Tikhonov ] The raid_run_ops routine uses the asynchronous offload api and the stripe_operations member of a stripe_head to carry out xor+pq+copy operations asynchronously, outside the lock. The operations performed by RAID-6 are the same as in the RAID-5 case except for no support of STRIPE_OP_PREXOR operations. All the others are supported: STRIPE_OP_BIOFILL - copy data into request buffers to satisfy a read request STRIPE_OP_COMPUTE_BLK - generate missing blocks (1 or 2) in the cache from the other blocks STRIPE_OP_BIODRAIN - copy data out of request buffers to satisfy a write request STRIPE_OP_RECONSTRUCT - recalculate parity for new data that has entered the cache STRIPE_OP_CHECK - verify that the parity is correct The flow is the same as in the RAID-5 case, and reuses some routines, namely: 1/ ops_complete_postxor (renamed to ops_complete_reconstruct) 2/ ops_complete_compute (updated to set up to 2 targets uptodate) 3/ ops_run_check (renamed to ops_run_check_p for xor parity checks) [neilb@suse.de: fixes to get it to pass mdadm regression suite] Reviewed-by:
Andre Noll <maan@systemlinux.org> Signed-off-by:
Yuri Tikhonov <yur@emcraft.com> Signed-off-by:
Ilya Yanok <yanok@emcraft.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
Replace the flat zero_sum_result with a collection of flags to contain the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed solomon syndrome) zero-sum result. Use the SUM_CHECK_ namespace instead of DMA_ since these flags will be used on non-dma-zero-sum enabled platforms. Reviewed-by:
Andre Noll <maan@systemlinux.org> Acked-by:
Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
Use percpu memory rather than stack for storing the buffer lists used in parity calculations. Include space for dma address conversions and pass that to async_tx via the async_submit_ctl.scribble pointer. [ Impact: move memory pressure from stack to heap ] Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
In preparation for asynchronous handling of raid6 operations move the spare page to a percpu allocation to allow multiple simultaneous synchronous raid6 recovery operations. Make this allocation cpu hotplug aware to maximize allocation efficiency. Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- 17 Jun, 2009 1 commit
-
-
Andre Noll authored
This kills some more shifts. Signed-off-by:
Andre Noll <maan@systemlinux.org> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 16 Jun, 2009 1 commit
-
-
NeilBrown authored
Having a macro just to cast a void* isn't really helpful. I would must rather see that we are simply de-referencing ->private, than have to know what the macro does. So open code the macro everywhere and remove the pointless cast. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- 31 Mar, 2009 13 commits
-
-
NeilBrown authored
We currently update the metadata : 1/ every 3Megabytes 2/ When the place we will write new-layout data to is recorded in the metadata as still containing old-layout data. Rule one exists to avoid having to re-do too much reshaping in the face of a crash/restart. So it should really be time based rather than size based. So change it to "every 10 seconds". Rule two turns out to be too harsh when restriping an array 'in-place', as in that case the metadata much be updates for every stripe. For the in-place update, it can only possibly be safe from a crash if some user-space program data a backup of every e.g. few hundred stripes before allowing them to be reshaped. In that case, the constant metadata update is pointless. So only update the metadata if the new metadata will report that the end of the 'old-layout' data is beyond where we are currently writing 'new-layout' data. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
Add prev_algo to raid5_conf_t along the same lines as prev_chunk and previous_raid_disks. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
Add "prev_chunk" to raid5_conf_t, similar to "previous_raid_disks", to remember what the chunk size was before the reshape that is currently underway. This seems like duplication with "chunk_size" and "new_chunk" in mddev_t, and to some extent it is, but there are differences. The values in mddev_t are always defined and often the same. The prev* values are only defined if a reshape is underway. Also (and more significantly) the raid5_conf_t values will be changed at the same time (inside an appropriate lock) that the reshape is started by setting reshape_position. In contrast, the new_chunk value is set when the sysfs file is written which could be well before the reshape starts. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
During a raid5 reshape, we have some stripes in the cache that are 'before' the reshape (and are still to be processed) and some that are 'after'. They are currently differentiated by having different ->disks values as the only reshape current supported involves changing the number of disks. However we will soon support reshapes that do not change the number of disks (chunk parity or chunk size). So make the difference more explicit with a 'generation' number. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
When reducing the number of devices in a raid4/5/6, the reshape process has to start at the end of the array and work down to the beginning. So we need to handle expand_progress and expand_lo differently. This patch renames "expand_progress" and "expand_lo" to avoid the implication that anything is getting bigger (expand->reshape) and every place they are used, we make sure that they are used the right way depending on whether delta_disks is positive or negative. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
We now have this value in stripe_head so we don't need to duplicate it. Signed-off-by:
NeilBrown <neilb@suse.de>
-
Dan Williams authored
Move the raid6 data processing routines into a standalone module (raid6_pq) to prepare them to be called from async_tx wrappers and other non-md drivers/modules. This precludes a circular dependency of raid456 needing the async modules for data processing while those modules in turn depend on raid456 for the base level synchronous raid6 routines. To support this move: 1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h 2/ The raid6_call, recovery calls, and table symbols are exported 3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to compile Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
.. so that the code to create the private data structures is separate. This will help with future code to change the level of an active array. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
DDF requires RAID6 calculations over different devices in a different order. For md/raid6, we calculate over just the data devices, starting immediately after the 'Q' block. For ddf/raid6 we calculate over all devices, using zeros in place of the P and Q blocks. This requires unfortunately complex loops... Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
DDF uses different layouts for P and Q blocks than current md/raid6 so add those that are missing. Also add support for RAID6 layouts that are identical to various raid5 layouts with the simple addition of one device to hold all of the 'Q' blocks. Finally add 'raid5' layouts to match raid4. These last to will allow online level conversion. Note that this does not provide correct support for DDF/raid6 yet as the order in which data blocks are summed to produce the Q block is significant and different between current md code and DDF requirements. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
Code currently assumes that the devices in a raid6 stripe are 0 1 ... N-1 P Q in some rotated order. We will shortly add new layouts in which this strict pattern is broken. So remove this expectation. We still assume that the data disks are roughly in-order. However P and Q can be inserted anywhere within that order. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by:
NeilBrown <neilb@suse.de>
-
Christoph Hellwig authored
Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
NeilBrown <neilb@suse.de>
-