Commit 9164e4a5 authored by Song Liu's avatar Song Liu

Merge branch 'md-suspend-rewrite' into md-next

From Yu Kuai, written by Song Liu

Recent tests with raid10 revealed many issues with the following scenarios:

- add or remove disks to the array
- issue io to the array

At first, we fixed each problem independently respect that io can
concurrent with array reconfiguration. However, with more issues reported
continuously, I am hoping to fix these problems thoroughly.

Refer to how block layer protect io with queue reconfiguration (for
example, change elevator):

blk_mq_freeze_queue
-> wait for all io to be done, and prevent new io to be dispatched
// reconfiguration
blk_mq_unfreeze_queue

I think we can do something similar to synchronize io with array
reconfiguration.

Current synchronization works as the following. For the reconfiguration
operation:

1. Hold 'reconfig_mutex';
2. Check that rdev can be added/removed, one condition is that there is no
   IO (for example, check nr_pending).
3. Do the actual operations to add/remove a rdev, one procedure is
   set/clear a pointer to rdev.
4. Check if there is still no IO on this rdev, if not, revert the
   change.

IO path uses rcu_read_lock/unlock() to access rdev.

- rcu is used wrongly;
- There are lots of places involved that old rdev can be read, however,
many places doesn't handle old value correctly;
- Between step 3 and 4, if new io is dispatched, NULL will be read for
the rdev, and data will be lost if step 4 failed.

The new synchronization is similar to blk_mq_freeze_queue(). To add or
remove disk:

1. Suspend the array, that is, stop new IO from being dispatched
   and wait for inflight IO to finish.
2. Add or remove rdevs to array;
3. Resume the array;

IO path doesn't need to change for now, and all rcu implementation can
be removed.

Then main work is divided into 3 steps:

First, first make sure new apis to suspend the array is general:

- make sure suspend array will wait for io to be done(Done by [1]);
- make sure suspend array can be called for all personalities(Done by [2]);
- make sure suspend array can be called at any time(Done by [3]);
- make sure suspend array doesn't rely on 'reconfig_mutex'(PATCH 3-5);

Second replace old apis with new apis(PATCH 6-16). Specifically, the
synchronization is changed from:

  lock reconfig_mutex
  suspend array
  make changes
  resume array
  unlock reconfig_mutex

to:
   suspend array
   lock reconfig_mutex
   make changes
   unlock reconfig_mutex
   resume array

Finally, for the remain path that involved reconfiguration, suspend the
array first(PATCH 11,12, [4] and PATCH 17):

Preparatory work:
[1] https://lore.kernel.org/all/20230621165110.1498313-1-yukuai1@huaweicloud.com/
[2] https://lore.kernel.org/all/20230628012931.88911-2-yukuai1@huaweicloud.com/
[3] https://lore.kernel.org/all/20230825030956.1527023-1-yukuai1@huaweicloud.com/
[4] https://lore.kernel.org/all/20230825031622.1530464-1-yukuai1@huaweicloud.com/

* md-suspend-rewrite:
  md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
  md: remove old apis to suspend the array
  md: suspend array in md_start_sync() if array need reconfiguration
  md/raid5: replace suspend with quiesce() callback
  md/md-linear: cleanup linear_add()
  md: cleanup mddev_create/destroy_serial_pool()
  md: use new apis to suspend array before mddev_create/destroy_serial_pool
  md: use new apis to suspend array for ioctls involed array reconfiguration
  md: use new apis to suspend array for adding/removing rdev from state_store()
  md: use new apis to suspend array for sysfs apis
  md/raid5: use new apis to suspend array
  md/raid5-cache: use new apis to suspend array
  md/md-bitmap: use new apis to suspend array for location_store()
  md/dm-raid: use new apis to suspend array
  md: add new helpers to suspend/resume and lock/unlock array
  md: add new helpers to suspend/resume array
  md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
  md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
  md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
parents 9e55a22f 2b16a525
...@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery); set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
/* Has to be held on running the array */ /* Has to be held on running the array */
mddev_lock_nointr(&rs->md); mddev_suspend_and_lock_nointr(&rs->md);
r = md_run(&rs->md); r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */ rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) { if (r) {
...@@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
} }
} }
mddev_suspend(&rs->md);
set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags); set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);
/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */ /* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
...@@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti) ...@@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery)) if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
md_stop_writes(&rs->md); md_stop_writes(&rs->md);
mddev_lock_nointr(&rs->md); mddev_suspend(&rs->md, false);
mddev_suspend(&rs->md);
mddev_unlock(&rs->md);
} }
} }
...@@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti) ...@@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti)
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0; mddev->ro = 0;
mddev->in_sync = 0; mddev->in_sync = 0;
mddev_resume(mddev); mddev_unlock_and_resume(mddev);
mddev_unlock(mddev);
} }
} }
......
...@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args) ...@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
return; return;
} }
err = mddev_lock(mddev); err = mddev_suspend_and_lock(mddev);
if (err) { if (err) {
pr_err("md: failed to lock array %s\n", name); pr_err("md: failed to lock array %s\n", name);
goto out_mddev_put; goto out_mddev_put;
...@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args) ...@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
if (err) if (err)
pr_warn("md: starting %s failed\n", name); pr_warn("md: starting %s failed\n", name);
out_unlock: out_unlock:
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
out_mddev_put: out_mddev_put:
mddev_put(mddev); mddev_put(mddev);
} }
......
...@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev) ...@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)
md_bitmap_wait_behind_writes(mddev); md_bitmap_wait_behind_writes(mddev);
if (!mddev->serialize_policy) if (!mddev->serialize_policy)
mddev_destroy_serial_pool(mddev, NULL, true); mddev_destroy_serial_pool(mddev, NULL);
mutex_lock(&mddev->bitmap_info.mutex); mutex_lock(&mddev->bitmap_info.mutex);
spin_lock(&mddev->lock); spin_lock(&mddev->lock);
...@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev) ...@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
goto out; goto out;
rdev_for_each(rdev, mddev) rdev_for_each(rdev, mddev)
mddev_create_serial_pool(mddev, rdev, true); mddev_create_serial_pool(mddev, rdev);
if (mddev_is_clustered(mddev)) if (mddev_is_clustered(mddev))
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes); md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
...@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len) ...@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
{ {
int rv; int rv;
rv = mddev_lock(mddev); rv = mddev_suspend_and_lock(mddev);
if (rv) if (rv)
return rv; return rv;
mddev_suspend(mddev);
if (mddev->pers) { if (mddev->pers) {
if (mddev->recovery || mddev->sync_thread) { if (mddev->recovery || mddev->sync_thread) {
rv = -EBUSY; rv = -EBUSY;
...@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len) ...@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
} }
rv = 0; rv = 0;
out: out:
mddev_resume(mddev); mddev_unlock_and_resume(mddev);
mddev_unlock(mddev);
if (rv) if (rv)
return rv; return rv;
return len; return len;
...@@ -2539,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len) ...@@ -2539,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (backlog > COUNTER_MAX) if (backlog > COUNTER_MAX)
return -EINVAL; return -EINVAL;
rv = mddev_lock(mddev); rv = mddev_suspend_and_lock(mddev);
if (rv) if (rv)
return rv; return rv;
...@@ -2564,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len) ...@@ -2564,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (!backlog && mddev->serial_info_pool) { if (!backlog && mddev->serial_info_pool) {
/* serial_info_pool is not needed if backlog is zero */ /* serial_info_pool is not needed if backlog is zero */
if (!mddev->serialize_policy) if (!mddev->serialize_policy)
mddev_destroy_serial_pool(mddev, NULL, false); mddev_destroy_serial_pool(mddev, NULL);
} else if (backlog && !mddev->serial_info_pool) { } else if (backlog && !mddev->serial_info_pool) {
/* serial_info_pool is needed since backlog is not zero */ /* serial_info_pool is needed since backlog is not zero */
rdev_for_each(rdev, mddev) rdev_for_each(rdev, mddev)
mddev_create_serial_pool(mddev, rdev, false); mddev_create_serial_pool(mddev, rdev);
} }
if (old_mwb != backlog) if (old_mwb != backlog)
md_bitmap_update_sb(mddev->bitmap); md_bitmap_update_sb(mddev->bitmap);
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return len; return len;
} }
......
...@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) ...@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
* in linear_congested(), therefore kfree_rcu() is used to free * in linear_congested(), therefore kfree_rcu() is used to free
* oldconf until no one uses it anymore. * oldconf until no one uses it anymore.
*/ */
mddev_suspend(mddev);
oldconf = rcu_dereference_protected(mddev->private, oldconf = rcu_dereference_protected(mddev->private,
lockdep_is_held(&mddev->reconfig_mutex)); lockdep_is_held(&mddev->reconfig_mutex));
mddev->raid_disks++; mddev->raid_disks++;
...@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev) ...@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
rcu_assign_pointer(mddev->private, newconf); rcu_assign_pointer(mddev->private, newconf);
md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
set_capacity_and_notify(mddev->gendisk, mddev->array_sectors); set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
mddev_resume(mddev);
kfree_rcu(oldconf, rcu); kfree_rcu(oldconf, rcu);
return 0; return 0;
} }
......
This diff is collapsed.
...@@ -248,10 +248,6 @@ struct md_cluster_info; ...@@ -248,10 +248,6 @@ struct md_cluster_info;
* become failed. * become failed.
* @MD_HAS_PPL: The raid array has PPL feature set. * @MD_HAS_PPL: The raid array has PPL feature set.
* @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set. * @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set.
* @MD_ALLOW_SB_UPDATE: md_check_recovery is allowed to update the metadata
* without taking reconfig_mutex.
* @MD_UPDATING_SB: md_check_recovery is updating the metadata without
* explicitly holding reconfig_mutex.
* @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that * @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
* array is ready yet. * array is ready yet.
* @MD_BROKEN: This is used to stop writes and mark array as failed. * @MD_BROKEN: This is used to stop writes and mark array as failed.
...@@ -268,8 +264,6 @@ enum mddev_flags { ...@@ -268,8 +264,6 @@ enum mddev_flags {
MD_FAILFAST_SUPPORTED, MD_FAILFAST_SUPPORTED,
MD_HAS_PPL, MD_HAS_PPL,
MD_HAS_MULTIPLE_PPLS, MD_HAS_MULTIPLE_PPLS,
MD_ALLOW_SB_UPDATE,
MD_UPDATING_SB,
MD_NOT_READY, MD_NOT_READY,
MD_BROKEN, MD_BROKEN,
MD_DELETED, MD_DELETED,
...@@ -316,6 +310,7 @@ struct mddev { ...@@ -316,6 +310,7 @@ struct mddev {
unsigned long sb_flags; unsigned long sb_flags;
int suspended; int suspended;
struct mutex suspend_mutex;
struct percpu_ref active_io; struct percpu_ref active_io;
int ro; int ro;
int sysfs_active; /* set when sysfs deletes int sysfs_active; /* set when sysfs deletes
...@@ -809,15 +804,14 @@ extern int md_rdev_init(struct md_rdev *rdev); ...@@ -809,15 +804,14 @@ extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev); extern void md_rdev_clear(struct md_rdev *rdev);
extern void md_handle_request(struct mddev *mddev, struct bio *bio); extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern void mddev_suspend(struct mddev *mddev); extern int mddev_suspend(struct mddev *mddev, bool interruptible);
extern void mddev_resume(struct mddev *mddev); extern void mddev_resume(struct mddev *mddev);
extern void md_reload_sb(struct mddev *mddev, int raid_disk); extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force); extern void md_update_sb(struct mddev *mddev, int force);
extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev, extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev);
bool is_suspend); extern void mddev_destroy_serial_pool(struct mddev *mddev,
extern void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev, struct md_rdev *rdev);
bool is_suspend);
struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr); struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev); struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
...@@ -855,6 +849,33 @@ static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio ...@@ -855,6 +849,33 @@ static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio
mddev->queue->limits.max_write_zeroes_sectors = 0; mddev->queue->limits.max_write_zeroes_sectors = 0;
} }
static inline int mddev_suspend_and_lock(struct mddev *mddev)
{
int ret;
ret = mddev_suspend(mddev, true);
if (ret)
return ret;
ret = mddev_lock(mddev);
if (ret)
mddev_resume(mddev);
return ret;
}
static inline void mddev_suspend_and_lock_nointr(struct mddev *mddev)
{
mddev_suspend(mddev, false);
mutex_lock(&mddev->reconfig_mutex);
}
static inline void mddev_unlock_and_resume(struct mddev *mddev)
{
mddev_unlock(mddev);
mddev_resume(mddev);
}
struct mdu_array_info_s; struct mdu_array_info_s;
struct mdu_disk_info_s; struct mdu_disk_info_s;
......
...@@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space); ...@@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
void r5c_check_stripe_cache_usage(struct r5conf *conf) void r5c_check_stripe_cache_usage(struct r5conf *conf)
{ {
int total_cached; int total_cached;
struct r5l_log *log = READ_ONCE(conf->log);
if (!r5c_is_writeback(conf->log)) if (!r5c_is_writeback(log))
return; return;
total_cached = atomic_read(&conf->r5c_cached_partial_stripes) + total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
...@@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf) ...@@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/ */
if (total_cached > conf->min_nr_stripes * 1 / 2 || if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0) atomic_read(&conf->empty_inactive_list_nr) > 0)
r5l_wake_reclaim(conf->log, 0); r5l_wake_reclaim(log, 0);
} }
/* /*
...@@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf) ...@@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/ */
void r5c_check_cached_full_stripe(struct r5conf *conf) void r5c_check_cached_full_stripe(struct r5conf *conf)
{ {
if (!r5c_is_writeback(conf->log)) struct r5l_log *log = READ_ONCE(conf->log);
if (!r5c_is_writeback(log))
return; return;
/* /*
...@@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf) ...@@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >= if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf), min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf))) conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
r5l_wake_reclaim(conf->log, 0); r5l_wake_reclaim(log, 0);
} }
/* /*
...@@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf) ...@@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
*/ */
static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf) static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
if (!r5c_is_writeback(log)) if (!r5c_is_writeback(log))
return 0; return 0;
...@@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log) ...@@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
void r5c_make_stripe_write_out(struct stripe_head *sh) void r5c_make_stripe_write_out(struct stripe_head *sh)
{ {
struct r5conf *conf = sh->raid_conf; struct r5conf *conf = sh->raid_conf;
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
BUG_ON(!r5c_is_writeback(log)); BUG_ON(!r5c_is_writeback(log));
...@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh) ...@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
*/ */
static void r5c_finish_cache_stripe(struct stripe_head *sh) static void r5c_finish_cache_stripe(struct stripe_head *sh)
{ {
struct r5l_log *log = sh->raid_conf->log; struct r5l_log *log = READ_ONCE(sh->raid_conf->log);
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) { if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state)); BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
...@@ -683,7 +686,6 @@ static void r5c_disable_writeback_async(struct work_struct *work) ...@@ -683,7 +686,6 @@ static void r5c_disable_writeback_async(struct work_struct *work)
disable_writeback_work); disable_writeback_work);
struct mddev *mddev = log->rdev->mddev; struct mddev *mddev = log->rdev->mddev;
struct r5conf *conf = mddev->private; struct r5conf *conf = mddev->private;
int locked = 0;
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH)
return; return;
...@@ -692,14 +694,14 @@ static void r5c_disable_writeback_async(struct work_struct *work) ...@@ -692,14 +694,14 @@ static void r5c_disable_writeback_async(struct work_struct *work)
/* wait superblock change before suspend */ /* wait superblock change before suspend */
wait_event(mddev->sb_wait, wait_event(mddev->sb_wait,
conf->log == NULL || !READ_ONCE(conf->log) ||
(!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
(locked = mddev_trylock(mddev))));
if (locked) { log = READ_ONCE(conf->log);
mddev_suspend(mddev); if (log) {
mddev_suspend(mddev, false);
log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
mddev_resume(mddev); mddev_resume(mddev);
mddev_unlock(mddev);
} }
} }
...@@ -1151,7 +1153,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) ...@@ -1151,7 +1153,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
static sector_t r5c_calculate_new_cp(struct r5conf *conf) static sector_t r5c_calculate_new_cp(struct r5conf *conf)
{ {
struct stripe_head *sh; struct stripe_head *sh;
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
sector_t new_cp; sector_t new_cp;
unsigned long flags; unsigned long flags;
...@@ -1159,12 +1161,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) ...@@ -1159,12 +1161,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
return log->next_checkpoint; return log->next_checkpoint;
spin_lock_irqsave(&log->stripe_in_journal_lock, flags); spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
if (list_empty(&conf->log->stripe_in_journal_list)) { if (list_empty(&log->stripe_in_journal_list)) {
/* all stripes flushed */ /* all stripes flushed */
spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
return log->next_checkpoint; return log->next_checkpoint;
} }
sh = list_first_entry(&conf->log->stripe_in_journal_list, sh = list_first_entry(&log->stripe_in_journal_list,
struct stripe_head, r5c); struct stripe_head, r5c);
new_cp = sh->log_start; new_cp = sh->log_start;
spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
...@@ -1399,7 +1401,7 @@ void r5c_flush_cache(struct r5conf *conf, int num) ...@@ -1399,7 +1401,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
struct stripe_head *sh, *next; struct stripe_head *sh, *next;
lockdep_assert_held(&conf->device_lock); lockdep_assert_held(&conf->device_lock);
if (!conf->log) if (!READ_ONCE(conf->log))
return; return;
count = 0; count = 0;
...@@ -1420,7 +1422,7 @@ void r5c_flush_cache(struct r5conf *conf, int num) ...@@ -1420,7 +1422,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
static void r5c_do_reclaim(struct r5conf *conf) static void r5c_do_reclaim(struct r5conf *conf)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
struct stripe_head *sh; struct stripe_head *sh;
int count = 0; int count = 0;
unsigned long flags; unsigned long flags;
...@@ -1549,7 +1551,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) ...@@ -1549,7 +1551,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
{ {
struct mddev *mddev = thread->mddev; struct mddev *mddev = thread->mddev;
struct r5conf *conf = mddev->private; struct r5conf *conf = mddev->private;
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
if (!log) if (!log)
return; return;
...@@ -1591,7 +1593,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce) ...@@ -1591,7 +1593,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
bool r5l_log_disk_error(struct r5conf *conf) bool r5l_log_disk_error(struct r5conf *conf)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
/* don't allow write if journal disk is missing */ /* don't allow write if journal disk is missing */
if (!log) if (!log)
...@@ -2583,9 +2585,7 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode) ...@@ -2583,9 +2585,7 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
mode == R5C_JOURNAL_MODE_WRITE_BACK) mode == R5C_JOURNAL_MODE_WRITE_BACK)
return -EINVAL; return -EINVAL;
mddev_suspend(mddev);
conf->log->r5c_journal_mode = mode; conf->log->r5c_journal_mode = mode;
mddev_resume(mddev);
pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n", pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
mdname(mddev), mode, r5c_journal_mode_str[mode]); mdname(mddev), mode, r5c_journal_mode_str[mode]);
...@@ -2610,11 +2610,11 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev, ...@@ -2610,11 +2610,11 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev,
if (strlen(r5c_journal_mode_str[mode]) == len && if (strlen(r5c_journal_mode_str[mode]) == len &&
!strncmp(page, r5c_journal_mode_str[mode], len)) !strncmp(page, r5c_journal_mode_str[mode], len))
break; break;
ret = mddev_lock(mddev); ret = mddev_suspend_and_lock(mddev);
if (ret) if (ret)
return ret; return ret;
ret = r5c_journal_mode_set(mddev, mode); ret = r5c_journal_mode_set(mddev, mode);
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return ret ?: length; return ret ?: length;
} }
...@@ -2635,7 +2635,7 @@ int r5c_try_caching_write(struct r5conf *conf, ...@@ -2635,7 +2635,7 @@ int r5c_try_caching_write(struct r5conf *conf,
struct stripe_head_state *s, struct stripe_head_state *s,
int disks) int disks)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
int i; int i;
struct r5dev *dev; struct r5dev *dev;
int to_cache = 0; int to_cache = 0;
...@@ -2802,7 +2802,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf, ...@@ -2802,7 +2802,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
struct stripe_head *sh, struct stripe_head *sh,
struct stripe_head_state *s) struct stripe_head_state *s)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
int i; int i;
int do_wakeup = 0; int do_wakeup = 0;
sector_t tree_index; sector_t tree_index;
...@@ -2941,7 +2941,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh) ...@@ -2941,7 +2941,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
/* check whether this big stripe is in write back cache. */ /* check whether this big stripe is in write back cache. */
bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect) bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
{ {
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
sector_t tree_index; sector_t tree_index;
void *slot; void *slot;
...@@ -3049,14 +3049,14 @@ int r5l_start(struct r5l_log *log) ...@@ -3049,14 +3049,14 @@ int r5l_start(struct r5l_log *log)
void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev) void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
{ {
struct r5conf *conf = mddev->private; struct r5conf *conf = mddev->private;
struct r5l_log *log = conf->log; struct r5l_log *log = READ_ONCE(conf->log);
if (!log) if (!log)
return; return;
if ((raid5_calc_degraded(conf) > 0 || if ((raid5_calc_degraded(conf) > 0 ||
test_bit(Journal, &rdev->flags)) && test_bit(Journal, &rdev->flags)) &&
conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK) log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
schedule_work(&log->disable_writeback_work); schedule_work(&log->disable_writeback_work);
} }
...@@ -3145,7 +3145,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev) ...@@ -3145,7 +3145,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
spin_lock_init(&log->stripe_in_journal_lock); spin_lock_init(&log->stripe_in_journal_lock);
atomic_set(&log->stripe_in_journal_count, 0); atomic_set(&log->stripe_in_journal_count, 0);
conf->log = log; WRITE_ONCE(conf->log, log);
set_bit(MD_HAS_JOURNAL, &conf->mddev->flags); set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
return 0; return 0;
...@@ -3173,7 +3173,7 @@ void r5l_exit_log(struct r5conf *conf) ...@@ -3173,7 +3173,7 @@ void r5l_exit_log(struct r5conf *conf)
* 'reconfig_mutex' is held by caller, set 'confg->log' to NULL to * 'reconfig_mutex' is held by caller, set 'confg->log' to NULL to
* ensure disable_writeback_work wakes up and exits. * ensure disable_writeback_work wakes up and exits.
*/ */
conf->log = NULL; WRITE_ONCE(conf->log, NULL);
wake_up(&conf->mddev->sb_wait); wake_up(&conf->mddev->sb_wait);
flush_work(&log->disable_writeback_work); flush_work(&log->disable_writeback_work);
......
...@@ -70,6 +70,8 @@ MODULE_PARM_DESC(devices_handle_discard_safely, ...@@ -70,6 +70,8 @@ MODULE_PARM_DESC(devices_handle_discard_safely,
"Set to Y if all devices in each array reliably return zeroes on reads from discarded regions"); "Set to Y if all devices in each array reliably return zeroes on reads from discarded regions");
static struct workqueue_struct *raid5_wq; static struct workqueue_struct *raid5_wq;
static void raid5_quiesce(struct mddev *mddev, int quiesce);
static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect) static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
{ {
int hash = (sect >> RAID5_STRIPE_SHIFT(conf)) & HASH_MASK; int hash = (sect >> RAID5_STRIPE_SHIFT(conf)) & HASH_MASK;
...@@ -2492,15 +2494,12 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors) ...@@ -2492,15 +2494,12 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
unsigned long cpu; unsigned long cpu;
int err = 0; int err = 0;
/* /* Never shrink. */
* Never shrink. And mddev_suspend() could deadlock if this is called
* from raid5d. In that case, scribble_disks and scribble_sectors
* should equal to new_disks and new_sectors
*/
if (conf->scribble_disks >= new_disks && if (conf->scribble_disks >= new_disks &&
conf->scribble_sectors >= new_sectors) conf->scribble_sectors >= new_sectors)
return 0; return 0;
mddev_suspend(conf->mddev);
raid5_quiesce(conf->mddev, true);
cpus_read_lock(); cpus_read_lock();
for_each_present_cpu(cpu) { for_each_present_cpu(cpu) {
...@@ -2514,7 +2513,8 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors) ...@@ -2514,7 +2513,8 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
} }
cpus_read_unlock(); cpus_read_unlock();
mddev_resume(conf->mddev); raid5_quiesce(conf->mddev, false);
if (!err) { if (!err) {
conf->scribble_disks = new_disks; conf->scribble_disks = new_disks;
conf->scribble_sectors = new_sectors; conf->scribble_sectors = new_sectors;
...@@ -7025,7 +7025,7 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len) ...@@ -7025,7 +7025,7 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
new != roundup_pow_of_two(new)) new != roundup_pow_of_two(new))
return -EINVAL; return -EINVAL;
err = mddev_lock(mddev); err = mddev_suspend_and_lock(mddev);
if (err) if (err)
return err; return err;
...@@ -7049,7 +7049,6 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len) ...@@ -7049,7 +7049,6 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
goto out_unlock; goto out_unlock;
} }
mddev_suspend(mddev);
mutex_lock(&conf->cache_size_mutex); mutex_lock(&conf->cache_size_mutex);
size = conf->max_nr_stripes; size = conf->max_nr_stripes;
...@@ -7064,10 +7063,9 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len) ...@@ -7064,10 +7063,9 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
err = -ENOMEM; err = -ENOMEM;
} }
mutex_unlock(&conf->cache_size_mutex); mutex_unlock(&conf->cache_size_mutex);
mddev_resume(mddev);
out_unlock: out_unlock:
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return err ?: len; return err ?: len;
} }
...@@ -7153,7 +7151,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len) ...@@ -7153,7 +7151,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
return -EINVAL; return -EINVAL;
new = !!new; new = !!new;
err = mddev_lock(mddev); err = mddev_suspend_and_lock(mddev);
if (err) if (err)
return err; return err;
conf = mddev->private; conf = mddev->private;
...@@ -7162,15 +7160,13 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len) ...@@ -7162,15 +7160,13 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
else if (new != conf->skip_copy) { else if (new != conf->skip_copy) {
struct request_queue *q = mddev->queue; struct request_queue *q = mddev->queue;
mddev_suspend(mddev);
conf->skip_copy = new; conf->skip_copy = new;
if (new) if (new)
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q); blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
else else
blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q); blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
mddev_resume(mddev);
} }
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return err ?: len; return err ?: len;
} }
...@@ -7225,15 +7221,13 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len) ...@@ -7225,15 +7221,13 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
if (new > 8192) if (new > 8192)
return -EINVAL; return -EINVAL;
err = mddev_lock(mddev); err = mddev_suspend_and_lock(mddev);
if (err) if (err)
return err; return err;
conf = mddev->private; conf = mddev->private;
if (!conf) if (!conf)
err = -ENODEV; err = -ENODEV;
else if (new != conf->worker_cnt_per_group) { else if (new != conf->worker_cnt_per_group) {
mddev_suspend(mddev);
old_groups = conf->worker_groups; old_groups = conf->worker_groups;
if (old_groups) if (old_groups)
flush_workqueue(raid5_wq); flush_workqueue(raid5_wq);
...@@ -7250,9 +7244,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len) ...@@ -7250,9 +7244,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
kfree(old_groups[0].workers); kfree(old_groups[0].workers);
kfree(old_groups); kfree(old_groups);
} }
mddev_resume(mddev);
} }
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return err ?: len; return err ?: len;
} }
...@@ -8558,8 +8551,8 @@ static int raid5_start_reshape(struct mddev *mddev) ...@@ -8558,8 +8551,8 @@ static int raid5_start_reshape(struct mddev *mddev)
* the reshape wasn't running - like Discard or Read - have * the reshape wasn't running - like Discard or Read - have
* completed. * completed.
*/ */
mddev_suspend(mddev); raid5_quiesce(mddev, true);
mddev_resume(mddev); raid5_quiesce(mddev, false);
/* Add some new drives, as many as will fit. /* Add some new drives, as many as will fit.
* We know there are enough to make the newly sized array work. * We know there are enough to make the newly sized array work.
...@@ -8974,12 +8967,12 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf) ...@@ -8974,12 +8967,12 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
struct r5conf *conf; struct r5conf *conf;
int err; int err;
err = mddev_lock(mddev); err = mddev_suspend_and_lock(mddev);
if (err) if (err)
return err; return err;
conf = mddev->private; conf = mddev->private;
if (!conf) { if (!conf) {
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return -ENODEV; return -ENODEV;
} }
...@@ -8989,19 +8982,14 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf) ...@@ -8989,19 +8982,14 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
err = log_init(conf, NULL, true); err = log_init(conf, NULL, true);
if (!err) { if (!err) {
err = resize_stripes(conf, conf->pool_size); err = resize_stripes(conf, conf->pool_size);
if (err) { if (err)
mddev_suspend(mddev);
log_exit(conf); log_exit(conf);
mddev_resume(mddev);
}
} }
} else } else
err = -EINVAL; err = -EINVAL;
} else if (strncmp(buf, "resync", 6) == 0) { } else if (strncmp(buf, "resync", 6) == 0) {
if (raid5_has_ppl(conf)) { if (raid5_has_ppl(conf)) {
mddev_suspend(mddev);
log_exit(conf); log_exit(conf);
mddev_resume(mddev);
err = resize_stripes(conf, conf->pool_size); err = resize_stripes(conf, conf->pool_size);
} else if (test_bit(MD_HAS_JOURNAL, &conf->mddev->flags) && } else if (test_bit(MD_HAS_JOURNAL, &conf->mddev->flags) &&
r5l_log_disk_error(conf)) { r5l_log_disk_error(conf)) {
...@@ -9014,11 +9002,9 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf) ...@@ -9014,11 +9002,9 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
break; break;
} }
if (!journal_dev_exists) { if (!journal_dev_exists)
mddev_suspend(mddev);
clear_bit(MD_HAS_JOURNAL, &mddev->flags); clear_bit(MD_HAS_JOURNAL, &mddev->flags);
mddev_resume(mddev); else /* need remove journal device first */
} else /* need remove journal device first */
err = -EBUSY; err = -EBUSY;
} else } else
err = -EINVAL; err = -EINVAL;
...@@ -9029,7 +9015,7 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf) ...@@ -9029,7 +9015,7 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
if (!err) if (!err)
md_update_sb(mddev, 1); md_update_sb(mddev, 1);
mddev_unlock(mddev); mddev_unlock_and_resume(mddev);
return err; return err;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment