Commit b0e5c294 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-4.19/dm-changes' of...

Merge tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - A couple stable fixes for the DM writecache target.

 - A stable fix for the DM cache target that fixes the potential for
   data corruption after an unclean shutdown of a cache device using
   writeback mode.

 - Update DM integrity target to allow the metadata to be stored on a
   separate device from data.

 - Fix DM kcopyd and the snapshot target to cond_resched() where
   appropriate and be more efficient with processing completed work.

 - A few fixes and improvements for DM crypt.

 - Add DM delay target feature to configure delay of flushes independent
   of writes.

 - Update DM thin-provisioning target to include metadata_low_watermark
   threshold in pool status.

 - Fix stale DM thin-provisioning Documentation.

* tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm writecache: fix a crash due to reading past end of dirty_bitmap
  dm crypt: don't decrease device limits
  dm cache metadata: set dirty on all cache blocks after a crash
  dm snapshot: remove stale FIXME in snapshot_map()
  dm snapshot: improve performance by switching out_of_order_list to rbtree
  dm kcopyd: avoid softlockup in run_complete_job
  dm cache metadata: save in-core policy_hint_size to on-disk superblock
  dm thin: stop no_space_timeout worker when switching to write-mode
  dm kcopyd: return void from dm_kcopyd_copy()
  dm thin: include metadata_low_watermark threshold in pool status
  dm writecache: report start_sector in status line
  dm crypt: convert essiv from ahash to shash
  dm crypt: use wake_up_process() instead of a wait queue
  dm integrity: recalculate checksums on creation
  dm integrity: flush journal on suspend when using separate metadata device
  dm integrity: use version 2 for separate metadata
  dm integrity: allow separate metadata device
  dm integrity: add ic->start in get_data_sector()
  dm integrity: report provided data sectors in the status
  dm integrity: implement fair range locks
  ...
parents 2645b9d1 1e1132ea
...@@ -5,7 +5,8 @@ Device-Mapper's "delay" target delays reads and/or writes ...@@ -5,7 +5,8 @@ Device-Mapper's "delay" target delays reads and/or writes
and maps them to different devices. and maps them to different devices.
Parameters: Parameters:
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>] <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
[<flush_device> <flush_offset> <flush_delay>]]
With separate write parameters, the first set is only used for reads. With separate write parameters, the first set is only used for reads.
Offsets are specified in sectors. Offsets are specified in sectors.
......
...@@ -113,6 +113,10 @@ internal_hash:algorithm(:key) (the key is optional) ...@@ -113,6 +113,10 @@ internal_hash:algorithm(:key) (the key is optional)
from an upper layer target, such as dm-crypt. The upper layer from an upper layer target, such as dm-crypt. The upper layer
target should check the validity of the integrity tags. target should check the validity of the integrity tags.
recalculate
Recalculate the integrity tags automatically. It is only valid
when using internal hash.
journal_crypt:algorithm(:key) (the key is optional) journal_crypt:algorithm(:key) (the key is optional)
Encrypt the journal using given algorithm to make sure that the Encrypt the journal using given algorithm to make sure that the
attacker can't read the journal. You can use a block cipher here attacker can't read the journal. You can use a block cipher here
......
...@@ -28,17 +28,18 @@ administrator some freedom, for example to: ...@@ -28,17 +28,18 @@ administrator some freedom, for example to:
Status Status
====== ======
These targets are very much still in the EXPERIMENTAL state. Please These targets are considered safe for production use. But different use
do not yet rely on them in production. But do experiment and offer us cases will have different performance characteristics, for example due
feedback. Different use cases will have different performance to fragmentation of the data volume.
characteristics, for example due to fragmentation of the data volume.
If you find this software is not performing as expected please mail If you find this software is not performing as expected please mail
dm-devel@redhat.com with details and we'll try our best to improve dm-devel@redhat.com with details and we'll try our best to improve
things for you. things for you.
Userspace tools for checking and repairing the metadata are under Userspace tools for checking and repairing the metadata have been fully
development. developed and are available as 'thin_check' and 'thin_repair'. The name
of the package that provides these utilities varies by distribution (on
a Red Hat distribution it is named 'device-mapper-persistent-data').
Cookbook Cookbook
======== ========
...@@ -280,7 +281,7 @@ ii) Status ...@@ -280,7 +281,7 @@ ii) Status
<transaction id> <used metadata blocks>/<total metadata blocks> <transaction id> <used metadata blocks>/<total metadata blocks>
<used data blocks>/<total data blocks> <held metadata root> <used data blocks>/<total data blocks> <held metadata root>
ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
needs_check|- needs_check|- metadata_low_watermark
transaction id: transaction id:
A 64-bit number used by userspace to help synchronise with metadata A 64-bit number used by userspace to help synchronise with metadata
...@@ -327,6 +328,11 @@ ii) Status ...@@ -327,6 +328,11 @@ ii) Status
thin-pool can be made fully operational again. '-' indicates thin-pool can be made fully operational again. '-' indicates
needs_check is not set. needs_check is not set.
metadata_low_watermark:
Value of metadata low watermark in blocks. The kernel sets this
value internally but userspace needs to know this value to
determine if an event was caused by crossing this threshold.
iii) Messages iii) Messages
create_thin <dev id> create_thin <dev id>
......
...@@ -363,7 +363,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd) ...@@ -363,7 +363,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
disk_super->version = cpu_to_le32(cmd->version); disk_super->version = cpu_to_le32(cmd->version);
memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name)); memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version)); memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
disk_super->policy_hint_size = 0; disk_super->policy_hint_size = cpu_to_le32(0);
__copy_sm_root(cmd, disk_super); __copy_sm_root(cmd, disk_super);
...@@ -701,6 +701,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd, ...@@ -701,6 +701,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
disk_super->policy_version[0] = cpu_to_le32(cmd->policy_version[0]); disk_super->policy_version[0] = cpu_to_le32(cmd->policy_version[0]);
disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]); disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]);
disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]); disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]);
disk_super->policy_hint_size = cpu_to_le32(cmd->policy_hint_size);
disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits); disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits);
disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses); disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses);
...@@ -1322,6 +1323,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, ...@@ -1322,6 +1323,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_oblock_t oblock; dm_oblock_t oblock;
unsigned flags; unsigned flags;
bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping)); memcpy(&mapping, mapping_value_le, sizeof(mapping));
...@@ -1332,8 +1334,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, ...@@ -1332,8 +1334,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint)); memcpy(&hint, hint_value_le, sizeof(hint));
} }
if (cmd->clean_when_opened)
dirty = flags & M_DIRTY;
r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY, r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid); le32_to_cpu(hint), hints_valid);
if (r) { if (r) {
DMERR("policy couldn't load cache block %llu", DMERR("policy couldn't load cache block %llu",
...@@ -1361,7 +1365,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, ...@@ -1361,7 +1365,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_oblock_t oblock; dm_oblock_t oblock;
unsigned flags; unsigned flags;
bool dirty; bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping)); memcpy(&mapping, mapping_value_le, sizeof(mapping));
...@@ -1372,8 +1376,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, ...@@ -1372,8 +1376,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint)); memcpy(&hint, hint_value_le, sizeof(hint));
} }
if (cmd->clean_when_opened)
dirty = dm_bitset_cursor_get_value(dirty_cursor); dirty = dm_bitset_cursor_get_value(dirty_cursor);
r = fn(context, oblock, to_cblock(cb), dirty, r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid); le32_to_cpu(hint), hints_valid);
if (r) { if (r) {
......
...@@ -1188,9 +1188,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context) ...@@ -1188,9 +1188,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
queue_continuation(mg->cache->wq, &mg->k); queue_continuation(mg->cache->wq, &mg->k);
} }
static int copy(struct dm_cache_migration *mg, bool promote) static void copy(struct dm_cache_migration *mg, bool promote)
{ {
int r;
struct dm_io_region o_region, c_region; struct dm_io_region o_region, c_region;
struct cache *cache = mg->cache; struct cache *cache = mg->cache;
...@@ -1203,11 +1202,9 @@ static int copy(struct dm_cache_migration *mg, bool promote) ...@@ -1203,11 +1202,9 @@ static int copy(struct dm_cache_migration *mg, bool promote)
c_region.count = cache->sectors_per_block; c_region.count = cache->sectors_per_block;
if (promote) if (promote)
r = dm_kcopyd_copy(cache->copier, &o_region, 1, &c_region, 0, copy_complete, &mg->k); dm_kcopyd_copy(cache->copier, &o_region, 1, &c_region, 0, copy_complete, &mg->k);
else else
r = dm_kcopyd_copy(cache->copier, &c_region, 1, &o_region, 0, copy_complete, &mg->k); dm_kcopyd_copy(cache->copier, &c_region, 1, &o_region, 0, copy_complete, &mg->k);
return r;
} }
static void bio_drop_shared_lock(struct cache *cache, struct bio *bio) static void bio_drop_shared_lock(struct cache *cache, struct bio *bio)
...@@ -1449,12 +1446,7 @@ static void mg_full_copy(struct work_struct *ws) ...@@ -1449,12 +1446,7 @@ static void mg_full_copy(struct work_struct *ws)
} }
init_continuation(&mg->k, mg_upgrade_lock); init_continuation(&mg->k, mg_upgrade_lock);
copy(mg, is_policy_promote);
if (copy(mg, is_policy_promote)) {
DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache));
mg->k.input = BLK_STS_IOERR;
mg_complete(mg, false);
}
} }
static void mg_copy(struct work_struct *ws) static void mg_copy(struct work_struct *ws)
...@@ -2250,7 +2242,7 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2250,7 +2242,7 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
{0, 2, "Invalid number of cache feature arguments"}, {0, 2, "Invalid number of cache feature arguments"},
}; };
int r; int r, mode_ctr = 0;
unsigned argc; unsigned argc;
const char *arg; const char *arg;
struct cache_features *cf = &ca->features; struct cache_features *cf = &ca->features;
...@@ -2264,14 +2256,20 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2264,14 +2256,20 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
while (argc--) { while (argc--) {
arg = dm_shift_arg(as); arg = dm_shift_arg(as);
if (!strcasecmp(arg, "writeback")) if (!strcasecmp(arg, "writeback")) {
cf->io_mode = CM_IO_WRITEBACK; cf->io_mode = CM_IO_WRITEBACK;
mode_ctr++;
}
else if (!strcasecmp(arg, "writethrough")) else if (!strcasecmp(arg, "writethrough")) {
cf->io_mode = CM_IO_WRITETHROUGH; cf->io_mode = CM_IO_WRITETHROUGH;
mode_ctr++;
}
else if (!strcasecmp(arg, "passthrough")) else if (!strcasecmp(arg, "passthrough")) {
cf->io_mode = CM_IO_PASSTHROUGH; cf->io_mode = CM_IO_PASSTHROUGH;
mode_ctr++;
}
else if (!strcasecmp(arg, "metadata2")) else if (!strcasecmp(arg, "metadata2"))
cf->metadata_version = 2; cf->metadata_version = 2;
...@@ -2282,6 +2280,11 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2282,6 +2280,11 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
} }
} }
if (mode_ctr > 1) {
*error = "Duplicate cache io_mode features requested";
return -EINVAL;
}
return 0; return 0;
} }
......
...@@ -99,7 +99,7 @@ struct crypt_iv_operations { ...@@ -99,7 +99,7 @@ struct crypt_iv_operations {
}; };
struct iv_essiv_private { struct iv_essiv_private {
struct crypto_ahash *hash_tfm; struct crypto_shash *hash_tfm;
u8 *salt; u8 *salt;
}; };
...@@ -144,7 +144,7 @@ struct crypt_config { ...@@ -144,7 +144,7 @@ struct crypt_config {
struct workqueue_struct *io_queue; struct workqueue_struct *io_queue;
struct workqueue_struct *crypt_queue; struct workqueue_struct *crypt_queue;
wait_queue_head_t write_thread_wait; spinlock_t write_thread_lock;
struct task_struct *write_thread; struct task_struct *write_thread;
struct rb_root write_tree; struct rb_root write_tree;
...@@ -327,25 +327,22 @@ static int crypt_iv_plain64be_gen(struct crypt_config *cc, u8 *iv, ...@@ -327,25 +327,22 @@ static int crypt_iv_plain64be_gen(struct crypt_config *cc, u8 *iv,
static int crypt_iv_essiv_init(struct crypt_config *cc) static int crypt_iv_essiv_init(struct crypt_config *cc)
{ {
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
AHASH_REQUEST_ON_STACK(req, essiv->hash_tfm); SHASH_DESC_ON_STACK(desc, essiv->hash_tfm);
struct scatterlist sg;
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
int err; int err;
sg_init_one(&sg, cc->key, cc->key_size); desc->tfm = essiv->hash_tfm;
ahash_request_set_tfm(req, essiv->hash_tfm); desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
ahash_request_set_crypt(req, &sg, essiv->salt, cc->key_size);
err = crypto_ahash_digest(req); err = crypto_shash_digest(desc, cc->key, cc->key_size, essiv->salt);
ahash_request_zero(req); shash_desc_zero(desc);
if (err) if (err)
return err; return err;
essiv_tfm = cc->iv_private; essiv_tfm = cc->iv_private;
err = crypto_cipher_setkey(essiv_tfm, essiv->salt, err = crypto_cipher_setkey(essiv_tfm, essiv->salt,
crypto_ahash_digestsize(essiv->hash_tfm)); crypto_shash_digestsize(essiv->hash_tfm));
if (err) if (err)
return err; return err;
...@@ -356,7 +353,7 @@ static int crypt_iv_essiv_init(struct crypt_config *cc) ...@@ -356,7 +353,7 @@ static int crypt_iv_essiv_init(struct crypt_config *cc)
static int crypt_iv_essiv_wipe(struct crypt_config *cc) static int crypt_iv_essiv_wipe(struct crypt_config *cc)
{ {
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
unsigned salt_size = crypto_ahash_digestsize(essiv->hash_tfm); unsigned salt_size = crypto_shash_digestsize(essiv->hash_tfm);
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
int r, err = 0; int r, err = 0;
...@@ -408,7 +405,7 @@ static void crypt_iv_essiv_dtr(struct crypt_config *cc) ...@@ -408,7 +405,7 @@ static void crypt_iv_essiv_dtr(struct crypt_config *cc)
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
crypto_free_ahash(essiv->hash_tfm); crypto_free_shash(essiv->hash_tfm);
essiv->hash_tfm = NULL; essiv->hash_tfm = NULL;
kzfree(essiv->salt); kzfree(essiv->salt);
...@@ -426,7 +423,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -426,7 +423,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
const char *opts) const char *opts)
{ {
struct crypto_cipher *essiv_tfm = NULL; struct crypto_cipher *essiv_tfm = NULL;
struct crypto_ahash *hash_tfm = NULL; struct crypto_shash *hash_tfm = NULL;
u8 *salt = NULL; u8 *salt = NULL;
int err; int err;
...@@ -436,14 +433,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -436,14 +433,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
} }
/* Allocate hash algorithm */ /* Allocate hash algorithm */
hash_tfm = crypto_alloc_ahash(opts, 0, CRYPTO_ALG_ASYNC); hash_tfm = crypto_alloc_shash(opts, 0, 0);
if (IS_ERR(hash_tfm)) { if (IS_ERR(hash_tfm)) {
ti->error = "Error initializing ESSIV hash"; ti->error = "Error initializing ESSIV hash";
err = PTR_ERR(hash_tfm); err = PTR_ERR(hash_tfm);
goto bad; goto bad;
} }
salt = kzalloc(crypto_ahash_digestsize(hash_tfm), GFP_KERNEL); salt = kzalloc(crypto_shash_digestsize(hash_tfm), GFP_KERNEL);
if (!salt) { if (!salt) {
ti->error = "Error kmallocing salt storage in ESSIV"; ti->error = "Error kmallocing salt storage in ESSIV";
err = -ENOMEM; err = -ENOMEM;
...@@ -454,7 +451,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -454,7 +451,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
cc->iv_gen_private.essiv.hash_tfm = hash_tfm; cc->iv_gen_private.essiv.hash_tfm = hash_tfm;
essiv_tfm = alloc_essiv_cipher(cc, ti, salt, essiv_tfm = alloc_essiv_cipher(cc, ti, salt,
crypto_ahash_digestsize(hash_tfm)); crypto_shash_digestsize(hash_tfm));
if (IS_ERR(essiv_tfm)) { if (IS_ERR(essiv_tfm)) {
crypt_iv_essiv_dtr(cc); crypt_iv_essiv_dtr(cc);
return PTR_ERR(essiv_tfm); return PTR_ERR(essiv_tfm);
...@@ -465,7 +462,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -465,7 +462,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
bad: bad:
if (hash_tfm && !IS_ERR(hash_tfm)) if (hash_tfm && !IS_ERR(hash_tfm))
crypto_free_ahash(hash_tfm); crypto_free_shash(hash_tfm);
kfree(salt); kfree(salt);
return err; return err;
} }
...@@ -1620,36 +1617,31 @@ static int dmcrypt_write(void *data) ...@@ -1620,36 +1617,31 @@ static int dmcrypt_write(void *data)
struct rb_root write_tree; struct rb_root write_tree;
struct blk_plug plug; struct blk_plug plug;
DECLARE_WAITQUEUE(wait, current); spin_lock_irq(&cc->write_thread_lock);
spin_lock_irq(&cc->write_thread_wait.lock);
continue_locked: continue_locked:
if (!RB_EMPTY_ROOT(&cc->write_tree)) if (!RB_EMPTY_ROOT(&cc->write_tree))
goto pop_from_list; goto pop_from_list;
set_current_state(TASK_INTERRUPTIBLE); set_current_state(TASK_INTERRUPTIBLE);
__add_wait_queue(&cc->write_thread_wait, &wait);
spin_unlock_irq(&cc->write_thread_wait.lock); spin_unlock_irq(&cc->write_thread_lock);
if (unlikely(kthread_should_stop())) { if (unlikely(kthread_should_stop())) {
set_current_state(TASK_RUNNING); set_current_state(TASK_RUNNING);
remove_wait_queue(&cc->write_thread_wait, &wait);
break; break;
} }
schedule(); schedule();
set_current_state(TASK_RUNNING); set_current_state(TASK_RUNNING);
spin_lock_irq(&cc->write_thread_wait.lock); spin_lock_irq(&cc->write_thread_lock);
__remove_wait_queue(&cc->write_thread_wait, &wait);
goto continue_locked; goto continue_locked;
pop_from_list: pop_from_list:
write_tree = cc->write_tree; write_tree = cc->write_tree;
cc->write_tree = RB_ROOT; cc->write_tree = RB_ROOT;
spin_unlock_irq(&cc->write_thread_wait.lock); spin_unlock_irq(&cc->write_thread_lock);
BUG_ON(rb_parent(write_tree.rb_node)); BUG_ON(rb_parent(write_tree.rb_node));
...@@ -1693,7 +1685,9 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async) ...@@ -1693,7 +1685,9 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
return; return;
} }
spin_lock_irqsave(&cc->write_thread_wait.lock, flags); spin_lock_irqsave(&cc->write_thread_lock, flags);
if (RB_EMPTY_ROOT(&cc->write_tree))
wake_up_process(cc->write_thread);
rbp = &cc->write_tree.rb_node; rbp = &cc->write_tree.rb_node;
parent = NULL; parent = NULL;
sector = io->sector; sector = io->sector;
...@@ -1706,9 +1700,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async) ...@@ -1706,9 +1700,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
} }
rb_link_node(&io->rb_node, parent, rbp); rb_link_node(&io->rb_node, parent, rbp);
rb_insert_color(&io->rb_node, &cc->write_tree); rb_insert_color(&io->rb_node, &cc->write_tree);
spin_unlock_irqrestore(&cc->write_thread_lock, flags);
wake_up_locked(&cc->write_thread_wait);
spin_unlock_irqrestore(&cc->write_thread_wait.lock, flags);
} }
static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
...@@ -2831,7 +2823,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -2831,7 +2823,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad; goto bad;
} }
init_waitqueue_head(&cc->write_thread_wait); spin_lock_init(&cc->write_thread_lock);
cc->write_tree = RB_ROOT; cc->write_tree = RB_ROOT;
cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write"); cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write");
...@@ -3069,11 +3061,11 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -3069,11 +3061,11 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
*/ */
limits->max_segment_size = PAGE_SIZE; limits->max_segment_size = PAGE_SIZE;
if (cc->sector_size != (1 << SECTOR_SHIFT)) { limits->logical_block_size =
limits->logical_block_size = cc->sector_size; max_t(unsigned short, limits->logical_block_size, cc->sector_size);
limits->physical_block_size = cc->sector_size; limits->physical_block_size =
blk_limits_io_min(limits, cc->sector_size); max_t(unsigned, limits->physical_block_size, cc->sector_size);
} limits->io_min = max_t(unsigned, limits->io_min, cc->sector_size);
} }
static struct target_type crypt_target = { static struct target_type crypt_target = {
......
...@@ -17,6 +17,13 @@ ...@@ -17,6 +17,13 @@
#define DM_MSG_PREFIX "delay" #define DM_MSG_PREFIX "delay"
struct delay_class {
struct dm_dev *dev;
sector_t start;
unsigned delay;
unsigned ops;
};
struct delay_c { struct delay_c {
struct timer_list delay_timer; struct timer_list delay_timer;
struct mutex timer_lock; struct mutex timer_lock;
...@@ -25,19 +32,16 @@ struct delay_c { ...@@ -25,19 +32,16 @@ struct delay_c {
struct list_head delayed_bios; struct list_head delayed_bios;
atomic_t may_delay; atomic_t may_delay;
struct dm_dev *dev_read; struct delay_class read;
sector_t start_read; struct delay_class write;
unsigned read_delay; struct delay_class flush;
unsigned reads;
struct dm_dev *dev_write; int argc;
sector_t start_write;
unsigned write_delay;
unsigned writes;
}; };
struct dm_delay_info { struct dm_delay_info {
struct delay_c *context; struct delay_c *context;
struct delay_class *class;
struct list_head list; struct list_head list;
unsigned long expires; unsigned long expires;
}; };
...@@ -77,7 +81,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -77,7 +81,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
{ {
struct dm_delay_info *delayed, *next; struct dm_delay_info *delayed, *next;
unsigned long next_expires = 0; unsigned long next_expires = 0;
int start_timer = 0; unsigned long start_timer = 0;
struct bio_list flush_bios = { }; struct bio_list flush_bios = { };
mutex_lock(&delayed_bios_lock); mutex_lock(&delayed_bios_lock);
...@@ -87,10 +91,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -87,10 +91,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
sizeof(struct dm_delay_info)); sizeof(struct dm_delay_info));
list_del(&delayed->list); list_del(&delayed->list);
bio_list_add(&flush_bios, bio); bio_list_add(&flush_bios, bio);
if ((bio_data_dir(bio) == WRITE)) delayed->class->ops--;
delayed->context->writes--;
else
delayed->context->reads--;
continue; continue;
} }
...@@ -100,7 +101,6 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -100,7 +101,6 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
} else } else
next_expires = min(next_expires, delayed->expires); next_expires = min(next_expires, delayed->expires);
} }
mutex_unlock(&delayed_bios_lock); mutex_unlock(&delayed_bios_lock);
if (start_timer) if (start_timer)
...@@ -117,6 +117,50 @@ static void flush_expired_bios(struct work_struct *work) ...@@ -117,6 +117,50 @@ static void flush_expired_bios(struct work_struct *work)
flush_bios(flush_delayed_bios(dc, 0)); flush_bios(flush_delayed_bios(dc, 0));
} }
static void delay_dtr(struct dm_target *ti)
{
struct delay_c *dc = ti->private;
destroy_workqueue(dc->kdelayd_wq);
if (dc->read.dev)
dm_put_device(ti, dc->read.dev);
if (dc->write.dev)
dm_put_device(ti, dc->write.dev);
if (dc->flush.dev)
dm_put_device(ti, dc->flush.dev);
mutex_destroy(&dc->timer_lock);
kfree(dc);
}
static int delay_class_ctr(struct dm_target *ti, struct delay_class *c, char **argv)
{
int ret;
unsigned long long tmpll;
char dummy;
if (sscanf(argv[1], "%llu%c", &tmpll, &dummy) != 1) {
ti->error = "Invalid device sector";
return -EINVAL;
}
c->start = tmpll;
if (sscanf(argv[2], "%u%c", &c->delay, &dummy) != 1) {
ti->error = "Invalid delay";
return -EINVAL;
}
ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &c->dev);
if (ret) {
ti->error = "Device lookup failed";
return ret;
}
return 0;
}
/* /*
* Mapping parameters: * Mapping parameters:
* <device> <offset> <delay> [<write_device> <write_offset> <write_delay>] * <device> <offset> <delay> [<write_device> <write_offset> <write_delay>]
...@@ -128,134 +172,89 @@ static void flush_expired_bios(struct work_struct *work) ...@@ -128,134 +172,89 @@ static void flush_expired_bios(struct work_struct *work)
static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{ {
struct delay_c *dc; struct delay_c *dc;
unsigned long long tmpll;
char dummy;
int ret; int ret;
if (argc != 3 && argc != 6) { if (argc != 3 && argc != 6 && argc != 9) {
ti->error = "Requires exactly 3 or 6 arguments"; ti->error = "Requires exactly 3, 6 or 9 arguments";
return -EINVAL; return -EINVAL;
} }
dc = kmalloc(sizeof(*dc), GFP_KERNEL); dc = kzalloc(sizeof(*dc), GFP_KERNEL);
if (!dc) { if (!dc) {
ti->error = "Cannot allocate context"; ti->error = "Cannot allocate context";
return -ENOMEM; return -ENOMEM;
} }
dc->reads = dc->writes = 0; ti->private = dc;
timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
INIT_LIST_HEAD(&dc->delayed_bios);
mutex_init(&dc->timer_lock);
atomic_set(&dc->may_delay, 1);
dc->argc = argc;
ret = -EINVAL; ret = delay_class_ctr(ti, &dc->read, argv);
if (sscanf(argv[1], "%llu%c", &tmpll, &dummy) != 1) { if (ret)
ti->error = "Invalid device sector";
goto bad; goto bad;
}
dc->start_read = tmpll;
if (sscanf(argv[2], "%u%c", &dc->read_delay, &dummy) != 1) { if (argc == 3) {
ti->error = "Invalid delay"; ret = delay_class_ctr(ti, &dc->write, argv);
if (ret)
goto bad; goto bad;
} ret = delay_class_ctr(ti, &dc->flush, argv);
if (ret)
ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
&dc->dev_read);
if (ret) {
ti->error = "Device lookup failed";
goto bad; goto bad;
}
ret = -EINVAL;
dc->dev_write = NULL;
if (argc == 3)
goto out; goto out;
if (sscanf(argv[4], "%llu%c", &tmpll, &dummy) != 1) {
ti->error = "Invalid write device sector";
goto bad_dev_read;
} }
dc->start_write = tmpll;
if (sscanf(argv[5], "%u%c", &dc->write_delay, &dummy) != 1) { ret = delay_class_ctr(ti, &dc->write, argv + 3);
ti->error = "Invalid write delay"; if (ret)
goto bad_dev_read; goto bad;
if (argc == 6) {
ret = delay_class_ctr(ti, &dc->flush, argv + 3);
if (ret)
goto bad;
goto out;
} }
ret = dm_get_device(ti, argv[3], dm_table_get_mode(ti->table), ret = delay_class_ctr(ti, &dc->flush, argv + 6);
&dc->dev_write); if (ret)
if (ret) { goto bad;
ti->error = "Write device lookup failed";
goto bad_dev_read;
}
out: out:
ret = -EINVAL;
dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0); dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0);
if (!dc->kdelayd_wq) { if (!dc->kdelayd_wq) {
ret = -EINVAL;
DMERR("Couldn't start kdelayd"); DMERR("Couldn't start kdelayd");
goto bad_queue; goto bad;
} }
timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
INIT_LIST_HEAD(&dc->delayed_bios);
mutex_init(&dc->timer_lock);
atomic_set(&dc->may_delay, 1);
ti->num_flush_bios = 1; ti->num_flush_bios = 1;
ti->num_discard_bios = 1; ti->num_discard_bios = 1;
ti->per_io_data_size = sizeof(struct dm_delay_info); ti->per_io_data_size = sizeof(struct dm_delay_info);
ti->private = dc;
return 0; return 0;
bad_queue:
if (dc->dev_write)
dm_put_device(ti, dc->dev_write);
bad_dev_read:
dm_put_device(ti, dc->dev_read);
bad: bad:
kfree(dc); delay_dtr(ti);
return ret; return ret;
} }
static void delay_dtr(struct dm_target *ti) static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
{
struct delay_c *dc = ti->private;
destroy_workqueue(dc->kdelayd_wq);
dm_put_device(ti, dc->dev_read);
if (dc->dev_write)
dm_put_device(ti, dc->dev_write);
mutex_destroy(&dc->timer_lock);
kfree(dc);
}
static int delay_bio(struct delay_c *dc, int delay, struct bio *bio)
{ {
struct dm_delay_info *delayed; struct dm_delay_info *delayed;
unsigned long expires = 0; unsigned long expires = 0;
if (!delay || !atomic_read(&dc->may_delay)) if (!c->delay || !atomic_read(&dc->may_delay))
return DM_MAPIO_REMAPPED; return DM_MAPIO_REMAPPED;
delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info)); delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
delayed->context = dc; delayed->context = dc;
delayed->expires = expires = jiffies + msecs_to_jiffies(delay); delayed->expires = expires = jiffies + msecs_to_jiffies(c->delay);
mutex_lock(&delayed_bios_lock); mutex_lock(&delayed_bios_lock);
c->ops++;
if (bio_data_dir(bio) == WRITE)
dc->writes++;
else
dc->reads++;
list_add_tail(&delayed->list, &dc->delayed_bios); list_add_tail(&delayed->list, &dc->delayed_bios);
mutex_unlock(&delayed_bios_lock); mutex_unlock(&delayed_bios_lock);
queue_timeout(dc, expires); queue_timeout(dc, expires);
...@@ -282,23 +281,28 @@ static void delay_resume(struct dm_target *ti) ...@@ -282,23 +281,28 @@ static void delay_resume(struct dm_target *ti)
static int delay_map(struct dm_target *ti, struct bio *bio) static int delay_map(struct dm_target *ti, struct bio *bio)
{ {
struct delay_c *dc = ti->private; struct delay_c *dc = ti->private;
struct delay_class *c;
struct dm_delay_info *delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
if ((bio_data_dir(bio) == WRITE) && (dc->dev_write)) { if (bio_data_dir(bio) == WRITE) {
bio_set_dev(bio, dc->dev_write->bdev); if (unlikely(bio->bi_opf & REQ_PREFLUSH))
if (bio_sectors(bio)) c = &dc->flush;
bio->bi_iter.bi_sector = dc->start_write + else
dm_target_offset(ti, bio->bi_iter.bi_sector); c = &dc->write;
} else {
return delay_bio(dc, dc->write_delay, bio); c = &dc->read;
} }
delayed->class = c;
bio_set_dev(bio, c->dev->bdev);
if (bio_sectors(bio))
bio->bi_iter.bi_sector = c->start + dm_target_offset(ti, bio->bi_iter.bi_sector);
bio_set_dev(bio, dc->dev_read->bdev); return delay_bio(dc, c, bio);
bio->bi_iter.bi_sector = dc->start_read +
dm_target_offset(ti, bio->bi_iter.bi_sector);
return delay_bio(dc, dc->read_delay, bio);
} }
#define DMEMIT_DELAY_CLASS(c) \
DMEMIT("%s %llu %u", (c)->dev->name, (unsigned long long)(c)->start, (c)->delay)
static void delay_status(struct dm_target *ti, status_type_t type, static void delay_status(struct dm_target *ti, status_type_t type,
unsigned status_flags, char *result, unsigned maxlen) unsigned status_flags, char *result, unsigned maxlen)
{ {
...@@ -307,17 +311,19 @@ static void delay_status(struct dm_target *ti, status_type_t type, ...@@ -307,17 +311,19 @@ static void delay_status(struct dm_target *ti, status_type_t type,
switch (type) { switch (type) {
case STATUSTYPE_INFO: case STATUSTYPE_INFO:
DMEMIT("%u %u", dc->reads, dc->writes); DMEMIT("%u %u %u", dc->read.ops, dc->write.ops, dc->flush.ops);
break; break;
case STATUSTYPE_TABLE: case STATUSTYPE_TABLE:
DMEMIT("%s %llu %u", dc->dev_read->name, DMEMIT_DELAY_CLASS(&dc->read);
(unsigned long long) dc->start_read, if (dc->argc >= 6) {
dc->read_delay); DMEMIT(" ");
if (dc->dev_write) DMEMIT_DELAY_CLASS(&dc->write);
DMEMIT(" %s %llu %u", dc->dev_write->name, }
(unsigned long long) dc->start_write, if (dc->argc >= 9) {
dc->write_delay); DMEMIT(" ");
DMEMIT_DELAY_CLASS(&dc->flush);
}
break; break;
} }
} }
...@@ -328,12 +334,15 @@ static int delay_iterate_devices(struct dm_target *ti, ...@@ -328,12 +334,15 @@ static int delay_iterate_devices(struct dm_target *ti,
struct delay_c *dc = ti->private; struct delay_c *dc = ti->private;
int ret = 0; int ret = 0;
ret = fn(ti, dc->dev_read, dc->start_read, ti->len, data); ret = fn(ti, dc->read.dev, dc->read.start, ti->len, data);
if (ret)
goto out;
ret = fn(ti, dc->write.dev, dc->write.start, ti->len, data);
if (ret)
goto out;
ret = fn(ti, dc->flush.dev, dc->flush.start, ti->len, data);
if (ret) if (ret)
goto out; goto out;
if (dc->dev_write)
ret = fn(ti, dc->dev_write, dc->start_write, ti->len, data);
out: out:
return ret; return ret;
......
This diff is collapsed.
...@@ -487,6 +487,8 @@ static int run_complete_job(struct kcopyd_job *job) ...@@ -487,6 +487,8 @@ static int run_complete_job(struct kcopyd_job *job)
if (atomic_dec_and_test(&kc->nr_jobs)) if (atomic_dec_and_test(&kc->nr_jobs))
wake_up(&kc->destroyq); wake_up(&kc->destroyq);
cond_resched();
return 0; return 0;
} }
...@@ -741,7 +743,7 @@ static void split_job(struct kcopyd_job *master_job) ...@@ -741,7 +743,7 @@ static void split_job(struct kcopyd_job *master_job)
} }
} }
int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
unsigned int num_dests, struct dm_io_region *dests, unsigned int num_dests, struct dm_io_region *dests,
unsigned int flags, dm_kcopyd_notify_fn fn, void *context) unsigned int flags, dm_kcopyd_notify_fn fn, void *context)
{ {
...@@ -818,16 +820,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, ...@@ -818,16 +820,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
job->progress = 0; job->progress = 0;
split_job(job); split_job(job);
} }
return 0;
} }
EXPORT_SYMBOL(dm_kcopyd_copy); EXPORT_SYMBOL(dm_kcopyd_copy);
int dm_kcopyd_zero(struct dm_kcopyd_client *kc, void dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context) unsigned flags, dm_kcopyd_notify_fn fn, void *context)
{ {
return dm_kcopyd_copy(kc, NULL, num_dests, dests, flags, fn, context); dm_kcopyd_copy(kc, NULL, num_dests, dests, flags, fn, context);
} }
EXPORT_SYMBOL(dm_kcopyd_zero); EXPORT_SYMBOL(dm_kcopyd_zero);
......
...@@ -326,9 +326,8 @@ static void recovery_complete(int read_err, unsigned long write_err, ...@@ -326,9 +326,8 @@ static void recovery_complete(int read_err, unsigned long write_err,
dm_rh_recovery_end(reg, !(read_err || write_err)); dm_rh_recovery_end(reg, !(read_err || write_err));
} }
static int recover(struct mirror_set *ms, struct dm_region *reg) static void recover(struct mirror_set *ms, struct dm_region *reg)
{ {
int r;
unsigned i; unsigned i;
struct dm_io_region from, to[DM_KCOPYD_MAX_REGIONS], *dest; struct dm_io_region from, to[DM_KCOPYD_MAX_REGIONS], *dest;
struct mirror *m; struct mirror *m;
...@@ -367,10 +366,8 @@ static int recover(struct mirror_set *ms, struct dm_region *reg) ...@@ -367,10 +366,8 @@ static int recover(struct mirror_set *ms, struct dm_region *reg)
if (!errors_handled(ms)) if (!errors_handled(ms))
set_bit(DM_KCOPYD_IGNORE_ERROR, &flags); set_bit(DM_KCOPYD_IGNORE_ERROR, &flags);
r = dm_kcopyd_copy(ms->kcopyd_client, &from, ms->nr_mirrors - 1, to, dm_kcopyd_copy(ms->kcopyd_client, &from, ms->nr_mirrors - 1, to,
flags, recovery_complete, reg); flags, recovery_complete, reg);
return r;
} }
static void reset_ms_flags(struct mirror_set *ms) static void reset_ms_flags(struct mirror_set *ms)
...@@ -388,7 +385,6 @@ static void do_recovery(struct mirror_set *ms) ...@@ -388,7 +385,6 @@ static void do_recovery(struct mirror_set *ms)
{ {
struct dm_region *reg; struct dm_region *reg;
struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh); struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh);
int r;
/* /*
* Start quiescing some regions. * Start quiescing some regions.
...@@ -398,11 +394,8 @@ static void do_recovery(struct mirror_set *ms) ...@@ -398,11 +394,8 @@ static void do_recovery(struct mirror_set *ms)
/* /*
* Copy any already quiesced regions. * Copy any already quiesced regions.
*/ */
while ((reg = dm_rh_recovery_start(ms->rh))) { while ((reg = dm_rh_recovery_start(ms->rh)))
r = recover(ms, reg); recover(ms, reg);
if (r)
dm_rh_recovery_end(reg, 0);
}
/* /*
* Update the in sync flag. * Update the in sync flag.
......
...@@ -85,7 +85,7 @@ struct dm_snapshot { ...@@ -85,7 +85,7 @@ struct dm_snapshot {
* A list of pending exceptions that completed out of order. * A list of pending exceptions that completed out of order.
* Protected by kcopyd single-threaded callback. * Protected by kcopyd single-threaded callback.
*/ */
struct list_head out_of_order_list; struct rb_root out_of_order_tree;
mempool_t pending_pool; mempool_t pending_pool;
...@@ -200,7 +200,7 @@ struct dm_snap_pending_exception { ...@@ -200,7 +200,7 @@ struct dm_snap_pending_exception {
/* A sequence number, it is used for in-order completion. */ /* A sequence number, it is used for in-order completion. */
sector_t exception_sequence; sector_t exception_sequence;
struct list_head out_of_order_entry; struct rb_node out_of_order_node;
/* /*
* For writing a complete chunk, bypassing the copy. * For writing a complete chunk, bypassing the copy.
...@@ -1173,7 +1173,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -1173,7 +1173,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
atomic_set(&s->pending_exceptions_count, 0); atomic_set(&s->pending_exceptions_count, 0);
s->exception_start_sequence = 0; s->exception_start_sequence = 0;
s->exception_complete_sequence = 0; s->exception_complete_sequence = 0;
INIT_LIST_HEAD(&s->out_of_order_list); s->out_of_order_tree = RB_ROOT;
mutex_init(&s->lock); mutex_init(&s->lock);
INIT_LIST_HEAD(&s->list); INIT_LIST_HEAD(&s->list);
spin_lock_init(&s->pe_lock); spin_lock_init(&s->pe_lock);
...@@ -1539,28 +1539,41 @@ static void copy_callback(int read_err, unsigned long write_err, void *context) ...@@ -1539,28 +1539,41 @@ static void copy_callback(int read_err, unsigned long write_err, void *context)
pe->copy_error = read_err || write_err; pe->copy_error = read_err || write_err;
if (pe->exception_sequence == s->exception_complete_sequence) { if (pe->exception_sequence == s->exception_complete_sequence) {
struct rb_node *next;
s->exception_complete_sequence++; s->exception_complete_sequence++;
complete_exception(pe); complete_exception(pe);
while (!list_empty(&s->out_of_order_list)) { next = rb_first(&s->out_of_order_tree);
pe = list_entry(s->out_of_order_list.next, while (next) {
struct dm_snap_pending_exception, out_of_order_entry); pe = rb_entry(next, struct dm_snap_pending_exception,
out_of_order_node);
if (pe->exception_sequence != s->exception_complete_sequence) if (pe->exception_sequence != s->exception_complete_sequence)
break; break;
next = rb_next(next);
s->exception_complete_sequence++; s->exception_complete_sequence++;
list_del(&pe->out_of_order_entry); rb_erase(&pe->out_of_order_node, &s->out_of_order_tree);
complete_exception(pe); complete_exception(pe);
cond_resched();
} }
} else { } else {
struct list_head *lh; struct rb_node *parent = NULL;
struct rb_node **p = &s->out_of_order_tree.rb_node;
struct dm_snap_pending_exception *pe2; struct dm_snap_pending_exception *pe2;
list_for_each_prev(lh, &s->out_of_order_list) { while (*p) {
pe2 = list_entry(lh, struct dm_snap_pending_exception, out_of_order_entry); pe2 = rb_entry(*p, struct dm_snap_pending_exception, out_of_order_node);
if (pe2->exception_sequence < pe->exception_sequence) parent = *p;
break;
BUG_ON(pe->exception_sequence == pe2->exception_sequence);
if (pe->exception_sequence < pe2->exception_sequence)
p = &((*p)->rb_left);
else
p = &((*p)->rb_right);
} }
list_add(&pe->out_of_order_entry, lh);
rb_link_node(&pe->out_of_order_node, parent, p);
rb_insert_color(&pe->out_of_order_node, &s->out_of_order_tree);
} }
} }
...@@ -1694,8 +1707,6 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) ...@@ -1694,8 +1707,6 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
if (!s->valid) if (!s->valid)
return DM_MAPIO_KILL; return DM_MAPIO_KILL;
/* FIXME: should only take write lock if we need
* to copy an exception */
mutex_lock(&s->lock); mutex_lock(&s->lock);
if (!s->valid || (unlikely(s->snapshot_overflowed) && if (!s->valid || (unlikely(s->snapshot_overflowed) &&
......
...@@ -1220,18 +1220,13 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool) ...@@ -1220,18 +1220,13 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool)
static void ll_zero(struct thin_c *tc, struct dm_thin_new_mapping *m, static void ll_zero(struct thin_c *tc, struct dm_thin_new_mapping *m,
sector_t begin, sector_t end) sector_t begin, sector_t end)
{ {
int r;
struct dm_io_region to; struct dm_io_region to;
to.bdev = tc->pool_dev->bdev; to.bdev = tc->pool_dev->bdev;
to.sector = begin; to.sector = begin;
to.count = end - begin; to.count = end - begin;
r = dm_kcopyd_zero(tc->pool->copier, 1, &to, 0, copy_complete, m); dm_kcopyd_zero(tc->pool->copier, 1, &to, 0, copy_complete, m);
if (r < 0) {
DMERR_LIMIT("dm_kcopyd_zero() failed");
copy_complete(1, 1, m);
}
} }
static void remap_and_issue_overwrite(struct thin_c *tc, struct bio *bio, static void remap_and_issue_overwrite(struct thin_c *tc, struct bio *bio,
...@@ -1257,7 +1252,6 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block, ...@@ -1257,7 +1252,6 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
struct dm_bio_prison_cell *cell, struct bio *bio, struct dm_bio_prison_cell *cell, struct bio *bio,
sector_t len) sector_t len)
{ {
int r;
struct pool *pool = tc->pool; struct pool *pool = tc->pool;
struct dm_thin_new_mapping *m = get_next_mapping(pool); struct dm_thin_new_mapping *m = get_next_mapping(pool);
...@@ -1296,19 +1290,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block, ...@@ -1296,19 +1290,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
to.sector = data_dest * pool->sectors_per_block; to.sector = data_dest * pool->sectors_per_block;
to.count = len; to.count = len;
r = dm_kcopyd_copy(pool->copier, &from, 1, &to, dm_kcopyd_copy(pool->copier, &from, 1, &to,
0, copy_complete, m); 0, copy_complete, m);
if (r < 0) {
DMERR_LIMIT("dm_kcopyd_copy() failed");
copy_complete(1, 1, m);
/*
* We allow the zero to be issued, to simplify the
* error path. Otherwise we'd need to start
* worrying about decrementing the prepare_actions
* counter.
*/
}
/* /*
* Do we need to zero a tail region? * Do we need to zero a tail region?
...@@ -2520,6 +2503,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) ...@@ -2520,6 +2503,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
case PM_WRITE: case PM_WRITE:
if (old_mode != new_mode) if (old_mode != new_mode)
notify_of_pool_mode_change(pool, "write"); notify_of_pool_mode_change(pool, "write");
if (old_mode == PM_OUT_OF_DATA_SPACE)
cancel_delayed_work_sync(&pool->no_space_timeout);
pool->out_of_data_space = false; pool->out_of_data_space = false;
pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space; pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space;
dm_pool_metadata_read_write(pool->pmd); dm_pool_metadata_read_write(pool->pmd);
...@@ -3890,6 +3875,8 @@ static void pool_status(struct dm_target *ti, status_type_t type, ...@@ -3890,6 +3875,8 @@ static void pool_status(struct dm_target *ti, status_type_t type,
else else
DMEMIT("- "); DMEMIT("- ");
DMEMIT("%llu ", (unsigned long long)calc_metadata_threshold(pt));
break; break;
case STATUSTYPE_TABLE: case STATUSTYPE_TABLE:
...@@ -3979,7 +3966,7 @@ static struct target_type pool_target = { ...@@ -3979,7 +3966,7 @@ static struct target_type pool_target = {
.name = "thin-pool", .name = "thin-pool",
.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE | .features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
DM_TARGET_IMMUTABLE, DM_TARGET_IMMUTABLE,
.version = {1, 19, 0}, .version = {1, 20, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = pool_ctr, .ctr = pool_ctr,
.dtr = pool_dtr, .dtr = pool_dtr,
...@@ -4353,7 +4340,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -4353,7 +4340,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type thin_target = { static struct target_type thin_target = {
.name = "thin", .name = "thin",
.version = {1, 19, 0}, .version = {1, 20, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = thin_ctr, .ctr = thin_ctr,
.dtr = thin_dtr, .dtr = thin_dtr,
......
...@@ -457,7 +457,7 @@ static void ssd_commit_flushed(struct dm_writecache *wc) ...@@ -457,7 +457,7 @@ static void ssd_commit_flushed(struct dm_writecache *wc)
COMPLETION_INITIALIZER_ONSTACK(endio.c), COMPLETION_INITIALIZER_ONSTACK(endio.c),
ATOMIC_INIT(1), ATOMIC_INIT(1),
}; };
unsigned bitmap_bits = wc->dirty_bitmap_size * BITS_PER_LONG; unsigned bitmap_bits = wc->dirty_bitmap_size * 8;
unsigned i = 0; unsigned i = 0;
while (1) { while (1) {
...@@ -2240,6 +2240,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2240,6 +2240,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
DMEMIT("%c %s %s %u ", WC_MODE_PMEM(wc) ? 'p' : 's', DMEMIT("%c %s %s %u ", WC_MODE_PMEM(wc) ? 'p' : 's',
wc->dev->name, wc->ssd_dev->name, wc->block_size); wc->dev->name, wc->ssd_dev->name, wc->block_size);
extra_args = 0; extra_args = 0;
if (wc->start_sector)
extra_args += 2;
if (wc->high_wm_percent_set) if (wc->high_wm_percent_set)
extra_args += 2; extra_args += 2;
if (wc->low_wm_percent_set) if (wc->low_wm_percent_set)
...@@ -2254,6 +2256,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2254,6 +2256,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
extra_args++; extra_args++;
DMEMIT("%u", extra_args); DMEMIT("%u", extra_args);
if (wc->start_sector)
DMEMIT(" start_sector %llu", (unsigned long long)wc->start_sector);
if (wc->high_wm_percent_set) { if (wc->high_wm_percent_set) {
x = (uint64_t)wc->freelist_high_watermark * 100; x = (uint64_t)wc->freelist_high_watermark * 100;
x += wc->n_blocks / 2; x += wc->n_blocks / 2;
...@@ -2280,7 +2284,7 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2280,7 +2284,7 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
static struct target_type writecache_target = { static struct target_type writecache_target = {
.name = "writecache", .name = "writecache",
.version = {1, 1, 0}, .version = {1, 1, 1},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = writecache_ctr, .ctr = writecache_ctr,
.dtr = writecache_dtr, .dtr = writecache_dtr,
......
...@@ -161,10 +161,8 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc, ...@@ -161,10 +161,8 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
/* Copy the valid region */ /* Copy the valid region */
set_bit(DMZ_RECLAIM_KCOPY, &zrc->flags); set_bit(DMZ_RECLAIM_KCOPY, &zrc->flags);
ret = dm_kcopyd_copy(zrc->kc, &src, 1, &dst, flags, dm_kcopyd_copy(zrc->kc, &src, 1, &dst, flags,
dmz_reclaim_kcopy_end, zrc); dmz_reclaim_kcopy_end, zrc);
if (ret)
return ret;
/* Wait for copy to complete */ /* Wait for copy to complete */
wait_on_bit_io(&zrc->flags, DMZ_RECLAIM_KCOPY, wait_on_bit_io(&zrc->flags, DMZ_RECLAIM_KCOPY,
......
...@@ -62,7 +62,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc); ...@@ -62,7 +62,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc);
typedef void (*dm_kcopyd_notify_fn)(int read_err, unsigned long write_err, typedef void (*dm_kcopyd_notify_fn)(int read_err, unsigned long write_err,
void *context); void *context);
int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context); unsigned flags, dm_kcopyd_notify_fn fn, void *context);
...@@ -81,7 +81,7 @@ void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc, ...@@ -81,7 +81,7 @@ void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc,
dm_kcopyd_notify_fn fn, void *context); dm_kcopyd_notify_fn fn, void *context);
void dm_kcopyd_do_callback(void *job, int read_err, unsigned long write_err); void dm_kcopyd_do_callback(void *job, int read_err, unsigned long write_err);
int dm_kcopyd_zero(struct dm_kcopyd_client *kc, void dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context); unsigned flags, dm_kcopyd_notify_fn fn, void *context);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment