Commit 6597ac8a authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'dm-4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM core cleanups:

     * blk-mq request-based DM no longer uses any mempools now that
       partial completions are no longer handled as part of cloned
       requests

 - DM raid cleanups and support for MD raid0

 - DM cache core advances and a new stochastic-multi-queue (smq) cache
   replacement policy

     * smq is the new default dm-cache policy

 - DM thinp cleanups and much more efficient large discard support

 - DM statistics support for request-based DM and nanosecond resolution
   timestamps

 - Fixes to DM stripe, DM log-writes, DM raid1 and DM crypt

* tag 'dm-4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (39 commits)
  dm stats: add support for request-based DM devices
  dm stats: collect and report histogram of IO latencies
  dm stats: support precise timestamps
  dm stats: fix divide by zero if 'number_of_areas' arg is zero
  dm cache: switch the "default" cache replacement policy from mq to smq
  dm space map metadata: fix occasional leak of a metadata block on resize
  dm thin metadata: fix a race when entering fail mode
  dm thin: fail messages with EOPNOTSUPP when pool cannot handle messages
  dm thin: range discard support
  dm thin metadata: add dm_thin_remove_range()
  dm thin metadata: add dm_thin_find_mapped_range()
  dm btree: add dm_btree_remove_leaves()
  dm stats: Use kvfree() in dm_kvfree()
  dm cache: age and write back cache entries even without active IO
  dm cache: prefix all DMERR and DMINFO messages with cache device name
  dm cache: add fail io mode and needs_check flag
  dm cache: wake the worker thread every time we free a migration object
  dm cache: add stochastic-multi-queue (smq) policy
  dm cache: boost promotion of blocks that will be overwritten
  dm cache: defer whole cells
  ...
parents e4bc13ad e262f347
......@@ -25,10 +25,10 @@ trying to see when the io scheduler has let the ios run.
Overview of supplied cache replacement policies
===============================================
multiqueue
----------
multiqueue (mq)
---------------
This policy is the default.
This policy has been deprecated in favor of the smq policy (see below).
The multiqueue policy has three sets of 16 queues: one set for entries
waiting for the cache and another two for those in the cache (a set for
......@@ -73,6 +73,67 @@ If you're trying to quickly warm a new cache device you may wish to
reduce these to encourage promotion. Remember to switch them back to
their defaults after the cache fills though.
Stochastic multiqueue (smq)
---------------------------
This policy is the default.
The stochastic multi-queue (smq) policy addresses some of the problems
with the multiqueue (mq) policy.
The smq policy (vs mq) offers the promise of less memory utilization,
improved performance and increased adaptability in the face of changing
workloads. SMQ also does not have any cumbersome tuning knobs.
Users may switch from "mq" to "smq" simply by appropriately reloading a
DM table that is using the cache target. Doing so will cause all of the
mq policy's hints to be dropped. Also, performance of the cache may
degrade slightly until smq recalculates the origin device's hotspots
that should be cached.
Memory usage:
The mq policy uses a lot of memory; 88 bytes per cache block on a 64
bit machine.
SMQ uses 28bit indexes to implement it's data structures rather than
pointers. It avoids storing an explicit hit count for each block. It
has a 'hotspot' queue rather than a pre cache which uses a quarter of
the entries (each hotspot block covers a larger area than a single
cache block).
All these mean smq uses ~25bytes per cache block. Still a lot of
memory, but a substantial improvement nontheless.
Level balancing:
MQ places entries in different levels of the multiqueue structures
based on their hit count (~ln(hit count)). This means the bottom
levels generally have the most entries, and the top ones have very
few. Having unbalanced levels like this reduces the efficacy of the
multiqueue.
SMQ does not maintain a hit count, instead it swaps hit entries with
the least recently used entry from the level above. The over all
ordering being a side effect of this stochastic process. With this
scheme we can decide how many entries occupy each multiqueue level,
resulting in better promotion/demotion decisions.
Adaptability:
The MQ policy maintains a hit count for each cache block. For a
different block to get promoted to the cache it's hit count has to
exceed the lowest currently in the cache. This means it can take a
long time for the cache to adapt between varying IO patterns.
Periodically degrading the hit counts could help with this, but I
haven't found a nice general solution.
SMQ doesn't maintain hit counts, so a lot of this problem just goes
away. In addition it tracks performance of the hotspot queue, which
is used to decide which blocks to promote. If the hotspot queue is
performing badly then it starts moving entries more quickly between
levels. This lets it adapt to new IO patterns very quickly.
Performance:
Testing SMQ shows substantially better performance than MQ.
cleaner
-------
......
......@@ -221,6 +221,7 @@ Status
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*
<cache metadata mode>
metadata block size : Fixed block size for each metadata block in
sectors
......@@ -251,8 +252,12 @@ core args : Key/value pairs for tuning the core
e.g. migration_threshold
policy name : Name of the policy
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs
e.g. sequential_threshold
policy args : Key/value pairs e.g. sequential_threshold
cache metadata mode : ro if read-only, rw if read-write
In serious cases where even a read-only mode is deemed unsafe
no further I/O will be permitted and the status will just
contain the string 'Fail'. The userspace recovery tools
should then be used.
Messages
--------
......
......@@ -224,3 +224,5 @@ Version History
New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
1.5.1 Add ability to restore transiently failed devices on resume.
1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
1.6.0 Add discard support (and devices_handle_discard_safely module param).
1.7.0 Add support for MD RAID0 mappings.
......@@ -13,9 +13,14 @@ the range specified.
The I/O statistics counters for each step-sized area of a region are
in the same format as /sys/block/*/stat or /proc/diskstats (see:
Documentation/iostats.txt). But two extra counters (12 and 13) are
provided: total time spent reading and writing in milliseconds. All
these counters may be accessed by sending the @stats_print message to
the appropriate DM device via dmsetup.
provided: total time spent reading and writing. When the histogram
argument is used, the 14th parameter is reported that represents the
histogram of latencies. All these counters may be accessed by sending
the @stats_print message to the appropriate DM device via dmsetup.
The reported times are in milliseconds and the granularity depends on
the kernel ticks. When the option precise_timestamps is used, the
reported times are in nanoseconds.
Each region has a corresponding unique identifier, which we call a
region_id, that is assigned when the region is created. The region_id
......@@ -33,7 +38,9 @@ memory is used by reading
Messages
========
@stats_create <range> <step> [<program_id> [<aux_data>]]
@stats_create <range> <step>
[<number_of_optional_arguments> <optional_arguments>...]
[<program_id> [<aux_data>]]
Create a new region and return the region_id.
......@@ -48,6 +55,29 @@ Messages
"/<number_of_areas>" - the range is subdivided into the specified
number of areas.
<number_of_optional_arguments>
The number of optional arguments
<optional_arguments>
The following optional arguments are supported
precise_timestamps - use precise timer with nanosecond resolution
instead of the "jiffies" variable. When this argument is
used, the resulting times are in nanoseconds instead of
milliseconds. Precise timestamps are a little bit slower
to obtain than jiffies-based timestamps.
histogram:n1,n2,n3,n4,... - collect histogram of latencies. The
numbers n1, n2, etc are times that represent the boundaries
of the histogram. If precise_timestamps is not used, the
times are in milliseconds, otherwise they are in
nanoseconds. For each range, the kernel will report the
number of requests that completed within this range. For
example, if we use "histogram:10,20,30", the kernel will
report four numbers a:b:c:d. a is the number of requests
that took 0-10 ms to complete, b is the number of requests
that took 10-20 ms to complete, c is the number of requests
that took 20-30 ms to complete and d is the number of
requests that took more than 30 ms to complete.
<program_id>
An optional parameter. A name that uniquely identifies
the userspace owner of the range. This groups ranges together
......@@ -55,6 +85,9 @@ Messages
created and ignore those created by others.
The kernel returns this string back in the output of
@stats_list message, but it doesn't use it for anything else.
If we omit the number of optional arguments, program id must not
be a number, otherwise it would be interpreted as the number of
optional arguments.
<aux_data>
An optional parameter. A word that provides auxiliary data
......
......@@ -304,6 +304,18 @@ config DM_CACHE_MQ
This is meant to be a general purpose policy. It prioritises
reads over writes.
config DM_CACHE_SMQ
tristate "Stochastic MQ Cache Policy (EXPERIMENTAL)"
depends on DM_CACHE
default y
---help---
A cache policy that uses a multiqueue ordered by recent hits
to select which blocks should be promoted and demoted.
This is meant to be a general purpose policy. It prioritises
reads over writes. This SMQ policy (vs MQ) offers the promise
of less memory utilization, improved performance and increased
adaptability in the face of changing workloads.
config DM_CACHE_CLEANER
tristate "Cleaner Cache Policy (EXPERIMENTAL)"
depends on DM_CACHE
......
......@@ -13,6 +13,7 @@ dm-log-userspace-y \
dm-thin-pool-y += dm-thin.o dm-thin-metadata.o
dm-cache-y += dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o
dm-cache-mq-y += dm-cache-policy-mq.o
dm-cache-smq-y += dm-cache-policy-smq.o
dm-cache-cleaner-y += dm-cache-policy-cleaner.o
dm-era-y += dm-era-target.o
md-mod-y += md.o bitmap.o
......@@ -54,6 +55,7 @@ obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_CACHE) += dm-cache.o
obj-$(CONFIG_DM_CACHE_MQ) += dm-cache-mq.o
obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o
obj-$(CONFIG_DM_CACHE_CLEANER) += dm-cache-cleaner.o
obj-$(CONFIG_DM_ERA) += dm-era.o
obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o
......
......@@ -255,6 +255,32 @@ void dm_cell_visit_release(struct dm_bio_prison *prison,
}
EXPORT_SYMBOL_GPL(dm_cell_visit_release);
static int __promote_or_release(struct dm_bio_prison *prison,
struct dm_bio_prison_cell *cell)
{
if (bio_list_empty(&cell->bios)) {
rb_erase(&cell->node, &prison->cells);
return 1;
}
cell->holder = bio_list_pop(&cell->bios);
return 0;
}
int dm_cell_promote_or_release(struct dm_bio_prison *prison,
struct dm_bio_prison_cell *cell)
{
int r;
unsigned long flags;
spin_lock_irqsave(&prison->lock, flags);
r = __promote_or_release(prison, cell);
spin_unlock_irqrestore(&prison->lock, flags);
return r;
}
EXPORT_SYMBOL_GPL(dm_cell_promote_or_release);
/*----------------------------------------------------------------*/
#define DEFERRED_SET_SIZE 64
......
......@@ -101,6 +101,19 @@ void dm_cell_visit_release(struct dm_bio_prison *prison,
void (*visit_fn)(void *, struct dm_bio_prison_cell *),
void *context, struct dm_bio_prison_cell *cell);
/*
* Rather than always releasing the prisoners in a cell, the client may
* want to promote one of them to be the new holder. There is a race here
* though between releasing an empty cell, and other threads adding new
* inmates. So this function makes the decision with its lock held.
*
* This function can have two outcomes:
* i) An inmate is promoted to be the holder of the cell (return value of 0).
* ii) The cell has no inmate for promotion and is released (return value of 1).
*/
int dm_cell_promote_or_release(struct dm_bio_prison *prison,
struct dm_bio_prison_cell *cell);
/*----------------------------------------------------------------*/
/*
......
......@@ -39,6 +39,8 @@
enum superblock_flag_bits {
/* for spotting crashes that would invalidate the dirty bitset */
CLEAN_SHUTDOWN,
/* metadata must be checked using the tools */
NEEDS_CHECK,
};
/*
......@@ -107,6 +109,7 @@ struct dm_cache_metadata {
struct dm_disk_bitset discard_info;
struct rw_semaphore root_lock;
unsigned long flags;
dm_block_t root;
dm_block_t hint_root;
dm_block_t discard_root;
......@@ -129,6 +132,14 @@ struct dm_cache_metadata {
* buffer before the superblock is locked and updated.
*/
__u8 metadata_space_map_root[SPACE_MAP_ROOT_SIZE];
/*
* Set if a transaction has to be aborted but the attempt to roll
* back to the previous (good) transaction failed. The only
* metadata operation permissible in this state is the closing of
* the device.
*/
bool fail_io:1;
};
/*-------------------------------------------------------------------
......@@ -527,6 +538,7 @@ static unsigned long clear_clean_shutdown(unsigned long flags)
static void read_superblock_fields(struct dm_cache_metadata *cmd,
struct cache_disk_superblock *disk_super)
{
cmd->flags = le32_to_cpu(disk_super->flags);
cmd->root = le64_to_cpu(disk_super->mapping_root);
cmd->hint_root = le64_to_cpu(disk_super->hint_root);
cmd->discard_root = le64_to_cpu(disk_super->discard_root);
......@@ -625,6 +637,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
if (mutator)
update_flags(disk_super, mutator);
disk_super->flags = cpu_to_le32(cmd->flags);
disk_super->mapping_root = cpu_to_le64(cmd->root);
disk_super->hint_root = cpu_to_le64(cmd->hint_root);
disk_super->discard_root = cpu_to_le64(cmd->discard_root);
......@@ -693,6 +706,7 @@ static struct dm_cache_metadata *metadata_open(struct block_device *bdev,
cmd->cache_blocks = 0;
cmd->policy_hint_size = policy_hint_size;
cmd->changed = true;
cmd->fail_io = false;
r = __create_persistent_data_objects(cmd, may_format_device);
if (r) {
......@@ -796,6 +810,7 @@ void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
list_del(&cmd->list);
mutex_unlock(&table_lock);
if (!cmd->fail_io)
__destroy_persistent_data_objects(cmd);
kfree(cmd);
}
......@@ -848,13 +863,26 @@ static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
return 0;
}
#define WRITE_LOCK(cmd) \
if (cmd->fail_io || dm_bm_is_read_only(cmd->bm)) \
return -EINVAL; \
down_write(&cmd->root_lock)
#define WRITE_LOCK_VOID(cmd) \
if (cmd->fail_io || dm_bm_is_read_only(cmd->bm)) \
return; \
down_write(&cmd->root_lock)
#define WRITE_UNLOCK(cmd) \
up_write(&cmd->root_lock)
int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size)
{
int r;
bool clean;
__le64 null_mapping = pack_value(0, 0);
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
__dm_bless_for_disk(&null_mapping);
if (from_cblock(new_cache_size) < from_cblock(cmd->cache_blocks)) {
......@@ -880,7 +908,7 @@ int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size)
cmd->changed = true;
out:
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -891,7 +919,7 @@ int dm_cache_discard_bitset_resize(struct dm_cache_metadata *cmd,
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = dm_bitset_resize(&cmd->discard_info,
cmd->discard_root,
from_dblock(cmd->discard_nr_blocks),
......@@ -903,7 +931,7 @@ int dm_cache_discard_bitset_resize(struct dm_cache_metadata *cmd,
}
cmd->changed = true;
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -946,9 +974,9 @@ int dm_cache_set_discard(struct dm_cache_metadata *cmd,
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = __discard(cmd, dblock, discard);
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1020,9 +1048,9 @@ int dm_cache_remove_mapping(struct dm_cache_metadata *cmd, dm_cblock_t cblock)
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = __remove(cmd, cblock);
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1048,9 +1076,9 @@ int dm_cache_insert_mapping(struct dm_cache_metadata *cmd,
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = __insert(cmd, cblock, oblock);
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1234,9 +1262,9 @@ int dm_cache_set_dirty(struct dm_cache_metadata *cmd,
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = __dirty(cmd, cblock, dirty);
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1252,9 +1280,9 @@ void dm_cache_metadata_get_stats(struct dm_cache_metadata *cmd,
void dm_cache_metadata_set_stats(struct dm_cache_metadata *cmd,
struct dm_cache_statistics *stats)
{
down_write(&cmd->root_lock);
WRITE_LOCK_VOID(cmd);
cmd->stats = *stats;
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
}
int dm_cache_commit(struct dm_cache_metadata *cmd, bool clean_shutdown)
......@@ -1263,7 +1291,7 @@ int dm_cache_commit(struct dm_cache_metadata *cmd, bool clean_shutdown)
flags_mutator mutator = (clean_shutdown ? set_clean_shutdown :
clear_clean_shutdown);
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = __commit_transaction(cmd, mutator);
if (r)
goto out;
......@@ -1271,7 +1299,7 @@ int dm_cache_commit(struct dm_cache_metadata *cmd, bool clean_shutdown)
r = __begin_transaction(cmd);
out:
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1376,9 +1404,9 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
{
int r;
down_write(&cmd->root_lock);
WRITE_LOCK(cmd);
r = write_hints(cmd, policy);
up_write(&cmd->root_lock);
WRITE_UNLOCK(cmd);
return r;
}
......@@ -1387,3 +1415,70 @@ int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
{
return blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
}
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd)
{
WRITE_LOCK_VOID(cmd);
dm_bm_set_read_only(cmd->bm);
WRITE_UNLOCK(cmd);
}
void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd)
{
WRITE_LOCK_VOID(cmd);
dm_bm_set_read_write(cmd->bm);
WRITE_UNLOCK(cmd);
}
int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd)
{
int r;
struct dm_block *sblock;
struct cache_disk_superblock *disk_super;
/*
* We ignore fail_io for this function.
*/
down_write(&cmd->root_lock);
set_bit(NEEDS_CHECK, &cmd->flags);
r = superblock_lock(cmd, &sblock);
if (r) {
DMERR("couldn't read superblock");
goto out;
}
disk_super = dm_block_data(sblock);
disk_super->flags = cpu_to_le32(cmd->flags);
dm_bm_unlock(sblock);
out:
up_write(&cmd->root_lock);
return r;
}
bool dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd)
{
bool needs_check;
down_read(&cmd->root_lock);
needs_check = !!test_bit(NEEDS_CHECK, &cmd->flags);
up_read(&cmd->root_lock);
return needs_check;
}
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
{
int r;
WRITE_LOCK(cmd);
__destroy_persistent_data_objects(cmd);
r = __create_persistent_data_objects(cmd, false);
if (r)
cmd->fail_io = true;
WRITE_UNLOCK(cmd);
return r;
}
......@@ -102,6 +102,10 @@ struct dm_cache_statistics {
void dm_cache_metadata_get_stats(struct dm_cache_metadata *cmd,
struct dm_cache_statistics *stats);
/*
* 'void' because it's no big deal if it fails.
*/
void dm_cache_metadata_set_stats(struct dm_cache_metadata *cmd,
struct dm_cache_statistics *stats);
......@@ -133,6 +137,12 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
*/
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
bool dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd);
int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd);
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd);
/*----------------------------------------------------------------*/
#endif /* DM_CACHE_METADATA_H */
......@@ -171,7 +171,8 @@ static void remove_cache_hash_entry(struct wb_cache_entry *e)
/* Public interface (see dm-cache-policy.h */
static int wb_map(struct dm_cache_policy *pe, dm_oblock_t oblock,
bool can_block, bool can_migrate, bool discarded_oblock,
struct bio *bio, struct policy_result *result)
struct bio *bio, struct policy_locker *locker,
struct policy_result *result)
{
struct policy *p = to_policy(pe);
struct wb_cache_entry *e;
......@@ -358,7 +359,8 @@ static struct wb_cache_entry *get_next_dirty_entry(struct policy *p)
static int wb_writeback_work(struct dm_cache_policy *pe,
dm_oblock_t *oblock,
dm_cblock_t *cblock)
dm_cblock_t *cblock,
bool critical_only)
{
int r = -ENOENT;
struct policy *p = to_policy(pe);
......
......@@ -7,6 +7,7 @@
#ifndef DM_CACHE_POLICY_INTERNAL_H
#define DM_CACHE_POLICY_INTERNAL_H
#include <linux/vmalloc.h>
#include "dm-cache-policy.h"
/*----------------------------------------------------------------*/
......@@ -16,9 +17,10 @@
*/
static inline int policy_map(struct dm_cache_policy *p, dm_oblock_t oblock,
bool can_block, bool can_migrate, bool discarded_oblock,
struct bio *bio, struct policy_result *result)
struct bio *bio, struct policy_locker *locker,
struct policy_result *result)
{
return p->map(p, oblock, can_block, can_migrate, discarded_oblock, bio, result);
return p->map(p, oblock, can_block, can_migrate, discarded_oblock, bio, locker, result);
}
static inline int policy_lookup(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t *cblock)
......@@ -54,9 +56,10 @@ static inline int policy_walk_mappings(struct dm_cache_policy *p,
static inline int policy_writeback_work(struct dm_cache_policy *p,
dm_oblock_t *oblock,
dm_cblock_t *cblock)
dm_cblock_t *cblock,
bool critical_only)
{
return p->writeback_work ? p->writeback_work(p, oblock, cblock) : -ENOENT;
return p->writeback_work ? p->writeback_work(p, oblock, cblock, critical_only) : -ENOENT;
}
static inline void policy_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
......@@ -80,19 +83,21 @@ static inline dm_cblock_t policy_residency(struct dm_cache_policy *p)
return p->residency(p);
}
static inline void policy_tick(struct dm_cache_policy *p)
static inline void policy_tick(struct dm_cache_policy *p, bool can_block)
{
if (p->tick)
return p->tick(p);
return p->tick(p, can_block);
}
static inline int policy_emit_config_values(struct dm_cache_policy *p, char *result, unsigned maxlen)
static inline int policy_emit_config_values(struct dm_cache_policy *p, char *result,
unsigned maxlen, ssize_t *sz_ptr)
{
ssize_t sz = 0;
ssize_t sz = *sz_ptr;
if (p->emit_config_values)
return p->emit_config_values(p, result, maxlen);
return p->emit_config_values(p, result, maxlen, sz_ptr);
DMEMIT("0");
DMEMIT("0 ");
*sz_ptr = sz;
return 0;
}
......@@ -104,6 +109,33 @@ static inline int policy_set_config_value(struct dm_cache_policy *p,
/*----------------------------------------------------------------*/
/*
* Some utility functions commonly used by policies and the core target.
*/
static inline size_t bitset_size_in_bytes(unsigned nr_entries)
{
return sizeof(unsigned long) * dm_div_up(nr_entries, BITS_PER_LONG);
}
static inline unsigned long *alloc_bitset(unsigned nr_entries)
{
size_t s = bitset_size_in_bytes(nr_entries);
return vzalloc(s);
}
static inline void clear_bitset(void *bitset, unsigned nr_entries)
{
size_t s = bitset_size_in_bytes(nr_entries);
memset(bitset, 0, s);
}
static inline void free_bitset(unsigned long *bits)
{
vfree(bits);
}
/*----------------------------------------------------------------*/
/*
* Creates a new cache policy given a policy name, a cache size, an origin size and the block size.
*/
......
......@@ -693,9 +693,10 @@ static void requeue(struct mq_policy *mq, struct entry *e)
* - set the hit count to a hard coded value other than 1, eg, is it better
* if it goes in at level 2?
*/
static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock)
static int demote_cblock(struct mq_policy *mq,
struct policy_locker *locker, dm_oblock_t *oblock)
{
struct entry *demoted = pop(mq, &mq->cache_clean);
struct entry *demoted = peek(&mq->cache_clean);
if (!demoted)
/*
......@@ -707,6 +708,13 @@ static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock)
*/
return -ENOSPC;
if (locker->fn(locker, demoted->oblock))
/*
* We couldn't lock the demoted block.
*/
return -EBUSY;
del(mq, demoted);
*oblock = demoted->oblock;
free_entry(&mq->cache_pool, demoted);
......@@ -795,6 +803,7 @@ static int cache_entry_found(struct mq_policy *mq,
* finding which cache block to use.
*/
static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
struct policy_locker *locker,
struct policy_result *result)
{
int r;
......@@ -803,11 +812,12 @@ static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
/* Ensure there's a free cblock in the cache */
if (epool_empty(&mq->cache_pool)) {
result->op = POLICY_REPLACE;
r = demote_cblock(mq, &result->old_oblock);
r = demote_cblock(mq, locker, &result->old_oblock);
if (r) {
result->op = POLICY_MISS;
return 0;
}
} else
result->op = POLICY_NEW;
......@@ -829,7 +839,8 @@ static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
static int pre_cache_entry_found(struct mq_policy *mq, struct entry *e,
bool can_migrate, bool discarded_oblock,
int data_dir, struct policy_result *result)
int data_dir, struct policy_locker *locker,
struct policy_result *result)
{
int r = 0;
......@@ -842,7 +853,7 @@ static int pre_cache_entry_found(struct mq_policy *mq, struct entry *e,
else {
requeue(mq, e);
r = pre_cache_to_cache(mq, e, result);
r = pre_cache_to_cache(mq, e, locker, result);
}
return r;
......@@ -872,6 +883,7 @@ static void insert_in_pre_cache(struct mq_policy *mq,
}
static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
struct policy_locker *locker,
struct policy_result *result)
{
int r;
......@@ -879,7 +891,7 @@ static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
if (epool_empty(&mq->cache_pool)) {
result->op = POLICY_REPLACE;
r = demote_cblock(mq, &result->old_oblock);
r = demote_cblock(mq, locker, &result->old_oblock);
if (unlikely(r)) {
result->op = POLICY_MISS;
insert_in_pre_cache(mq, oblock);
......@@ -907,11 +919,12 @@ static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
static int no_entry_found(struct mq_policy *mq, dm_oblock_t oblock,
bool can_migrate, bool discarded_oblock,
int data_dir, struct policy_result *result)
int data_dir, struct policy_locker *locker,
struct policy_result *result)
{
if (adjusted_promote_threshold(mq, discarded_oblock, data_dir) <= 1) {
if (can_migrate)
insert_in_cache(mq, oblock, result);
insert_in_cache(mq, oblock, locker, result);
else
return -EWOULDBLOCK;
} else {
......@@ -928,7 +941,8 @@ static int no_entry_found(struct mq_policy *mq, dm_oblock_t oblock,
*/
static int map(struct mq_policy *mq, dm_oblock_t oblock,
bool can_migrate, bool discarded_oblock,
int data_dir, struct policy_result *result)
int data_dir, struct policy_locker *locker,
struct policy_result *result)
{
int r = 0;
struct entry *e = hash_lookup(mq, oblock);
......@@ -942,11 +956,11 @@ static int map(struct mq_policy *mq, dm_oblock_t oblock,
else if (e)
r = pre_cache_entry_found(mq, e, can_migrate, discarded_oblock,
data_dir, result);
data_dir, locker, result);
else
r = no_entry_found(mq, oblock, can_migrate, discarded_oblock,
data_dir, result);
data_dir, locker, result);
if (r == -EWOULDBLOCK)
result->op = POLICY_MISS;
......@@ -1012,7 +1026,8 @@ static void copy_tick(struct mq_policy *mq)
static int mq_map(struct dm_cache_policy *p, dm_oblock_t oblock,
bool can_block, bool can_migrate, bool discarded_oblock,
struct bio *bio, struct policy_result *result)
struct bio *bio, struct policy_locker *locker,
struct policy_result *result)
{
int r;
struct mq_policy *mq = to_mq_policy(p);
......@@ -1028,7 +1043,7 @@ static int mq_map(struct dm_cache_policy *p, dm_oblock_t oblock,
iot_examine_bio(&mq->tracker, bio);
r = map(mq, oblock, can_migrate, discarded_oblock,
bio_data_dir(bio), result);
bio_data_dir(bio), locker, result);
mutex_unlock(&mq->lock);
......@@ -1221,7 +1236,7 @@ static int __mq_writeback_work(struct mq_policy *mq, dm_oblock_t *oblock,
}
static int mq_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock,
dm_cblock_t *cblock)
dm_cblock_t *cblock, bool critical_only)
{
int r;
struct mq_policy *mq = to_mq_policy(p);
......@@ -1268,7 +1283,7 @@ static dm_cblock_t mq_residency(struct dm_cache_policy *p)
return r;
}
static void mq_tick(struct dm_cache_policy *p)
static void mq_tick(struct dm_cache_policy *p, bool can_block)
{
struct mq_policy *mq = to_mq_policy(p);
unsigned long flags;
......@@ -1276,6 +1291,12 @@ static void mq_tick(struct dm_cache_policy *p)
spin_lock_irqsave(&mq->tick_lock, flags);
mq->tick_protected++;
spin_unlock_irqrestore(&mq->tick_lock, flags);
if (can_block) {
mutex_lock(&mq->lock);
copy_tick(mq);
mutex_unlock(&mq->lock);
}
}
static int mq_set_config_value(struct dm_cache_policy *p,
......@@ -1308,22 +1329,24 @@ static int mq_set_config_value(struct dm_cache_policy *p,
return 0;
}
static int mq_emit_config_values(struct dm_cache_policy *p, char *result, unsigned maxlen)
static int mq_emit_config_values(struct dm_cache_policy *p, char *result,
unsigned maxlen, ssize_t *sz_ptr)
{
ssize_t sz = 0;
ssize_t sz = *sz_ptr;
struct mq_policy *mq = to_mq_policy(p);
DMEMIT("10 random_threshold %u "
"sequential_threshold %u "
"discard_promote_adjustment %u "
"read_promote_adjustment %u "
"write_promote_adjustment %u",
"write_promote_adjustment %u ",
mq->tracker.thresholds[PATTERN_RANDOM],
mq->tracker.thresholds[PATTERN_SEQUENTIAL],
mq->discard_promote_adjustment,
mq->read_promote_adjustment,
mq->write_promote_adjustment);
*sz_ptr = sz;
return 0;
}
......@@ -1408,21 +1431,12 @@ static struct dm_cache_policy *mq_create(dm_cblock_t cache_size,
static struct dm_cache_policy_type mq_policy_type = {
.name = "mq",
.version = {1, 3, 0},
.version = {1, 4, 0},
.hint_size = 4,
.owner = THIS_MODULE,
.create = mq_create
};
static struct dm_cache_policy_type default_policy_type = {
.name = "default",
.version = {1, 3, 0},
.hint_size = 4,
.owner = THIS_MODULE,
.create = mq_create,
.real = &mq_policy_type
};
static int __init mq_init(void)
{
int r;
......@@ -1432,36 +1446,21 @@ static int __init mq_init(void)
__alignof__(struct entry),
0, NULL);
if (!mq_entry_cache)
goto bad;
return -ENOMEM;
r = dm_cache_policy_register(&mq_policy_type);
if (r) {
DMERR("register failed %d", r);
goto bad_register_mq;
kmem_cache_destroy(mq_entry_cache);
return -ENOMEM;
}
r = dm_cache_policy_register(&default_policy_type);
if (!r) {
DMINFO("version %u.%u.%u loaded",
mq_policy_type.version[0],
mq_policy_type.version[1],
mq_policy_type.version[2]);
return 0;
}
DMERR("register failed (as default) %d", r);
dm_cache_policy_unregister(&mq_policy_type);
bad_register_mq:
kmem_cache_destroy(mq_entry_cache);
bad:
return -ENOMEM;
}
static void __exit mq_exit(void)
{
dm_cache_policy_unregister(&mq_policy_type);
dm_cache_policy_unregister(&default_policy_type);
kmem_cache_destroy(mq_entry_cache);
}
......
This diff is collapsed.
......@@ -69,6 +69,18 @@ enum policy_operation {
POLICY_REPLACE
};
/*
* When issuing a POLICY_REPLACE the policy needs to make a callback to
* lock the block being demoted. This doesn't need to occur during a
* writeback operation since the block remains in the cache.
*/
struct policy_locker;
typedef int (*policy_lock_fn)(struct policy_locker *l, dm_oblock_t oblock);
struct policy_locker {
policy_lock_fn fn;
};
/*
* This is the instruction passed back to the core target.
*/
......@@ -122,7 +134,8 @@ struct dm_cache_policy {
*/
int (*map)(struct dm_cache_policy *p, dm_oblock_t oblock,
bool can_block, bool can_migrate, bool discarded_oblock,
struct bio *bio, struct policy_result *result);
struct bio *bio, struct policy_locker *locker,
struct policy_result *result);
/*
* Sometimes we want to see if a block is in the cache, without
......@@ -165,7 +178,9 @@ struct dm_cache_policy {
int (*remove_cblock)(struct dm_cache_policy *p, dm_cblock_t cblock);
/*
* Provide a dirty block to be written back by the core target.
* Provide a dirty block to be written back by the core target. If
* critical_only is set then the policy should only provide work if
* it urgently needs it.
*
* Returns:
*
......@@ -173,7 +188,8 @@ struct dm_cache_policy {
*
* -ENODATA: no dirty blocks available
*/
int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock);
int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock,
bool critical_only);
/*
* How full is the cache?
......@@ -184,16 +200,16 @@ struct dm_cache_policy {
* Because of where we sit in the block layer, we can be asked to
* map a lot of little bios that are all in the same block (no
* queue merging has occurred). To stop the policy being fooled by
* these the core target sends regular tick() calls to the policy.
* these, the core target sends regular tick() calls to the policy.
* The policy should only count an entry as hit once per tick.
*/
void (*tick)(struct dm_cache_policy *p);
void (*tick)(struct dm_cache_policy *p, bool can_block);
/*
* Configuration.
*/
int (*emit_config_values)(struct dm_cache_policy *p,
char *result, unsigned maxlen);
int (*emit_config_values)(struct dm_cache_policy *p, char *result,
unsigned maxlen, ssize_t *sz_ptr);
int (*set_config_value)(struct dm_cache_policy *p,
const char *key, const char *value);
......
This diff is collapsed.
/*
* Copyright (C) 2003 Jana Saout <jana@saout.de>
* Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
* Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved.
* Copyright (C) 2006-2015 Red Hat, Inc. All rights reserved.
* Copyright (C) 2013 Milan Broz <gmazyland@gmail.com>
*
* This file is released under the GPL.
......@@ -891,6 +891,11 @@ static void crypt_alloc_req(struct crypt_config *cc,
ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
ablkcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
/*
* Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
* requests if driver request queue is full.
*/
ablkcipher_request_set_callback(ctx->req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
kcryptd_async_done, dmreq_of_req(cc, ctx->req));
......@@ -924,24 +929,32 @@ static int crypt_convert(struct crypt_config *cc,
r = crypt_convert_block(cc, ctx, ctx->req);
switch (r) {
/* async */
/*
* The request was queued by a crypto driver
* but the driver request queue is full, let's wait.
*/
case -EBUSY:
wait_for_completion(&ctx->restart);
reinit_completion(&ctx->restart);
/* fall through*/
/* fall through */
/*
* The request is queued and processed asynchronously,
* completion function kcryptd_async_done() will be called.
*/
case -EINPROGRESS:
ctx->req = NULL;
ctx->cc_sector++;
continue;
/* sync */
/*
* The request was already processed (synchronously).
*/
case 0:
atomic_dec(&ctx->cc_pending);
ctx->cc_sector++;
cond_resched();
continue;
/* error */
/* There was an error while processing the request. */
default:
atomic_dec(&ctx->cc_pending);
return r;
......@@ -1346,6 +1359,11 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
struct crypt_config *cc = io->cc;
/*
* A request from crypto driver backlog is going to be processed now,
* finish the completion and continue in crypt_convert().
* (Callback will be called for the second time for this request.)
*/
if (error == -EINPROGRESS) {
complete(&ctx->restart);
return;
......
......@@ -55,8 +55,8 @@
#define LOG_DISCARD_FLAG (1 << 2)
#define LOG_MARK_FLAG (1 << 3)
#define WRITE_LOG_VERSION 1
#define WRITE_LOG_MAGIC 0x6a736677736872
#define WRITE_LOG_VERSION 1ULL
#define WRITE_LOG_MAGIC 0x6a736677736872ULL
/*
* The disk format for this is braindead simple.
......
This diff is collapsed.
......@@ -24,7 +24,9 @@
#define MAX_RECOVERY 1 /* Maximum number of regions recovered in parallel. */
#define DM_RAID1_HANDLE_ERRORS 0x01
#define DM_RAID1_KEEP_LOG 0x02
#define errors_handled(p) ((p)->features & DM_RAID1_HANDLE_ERRORS)
#define keep_log(p) ((p)->features & DM_RAID1_KEEP_LOG)
static DECLARE_WAIT_QUEUE_HEAD(_kmirrord_recovery_stopped);
......@@ -229,7 +231,7 @@ static void fail_mirror(struct mirror *m, enum dm_raid1_error error_type)
if (m != get_default_mirror(ms))
goto out;
if (!ms->in_sync) {
if (!ms->in_sync && !keep_log(ms)) {
/*
* Better to issue requests to same failing device
* than to risk returning corrupt data.
......@@ -370,6 +372,17 @@ static int recover(struct mirror_set *ms, struct dm_region *reg)
return r;
}
static void reset_ms_flags(struct mirror_set *ms)
{
unsigned int m;
ms->leg_failure = 0;
for (m = 0; m < ms->nr_mirrors; m++) {
atomic_set(&(ms->mirror[m].error_count), 0);
ms->mirror[m].error_type = 0;
}
}
static void do_recovery(struct mirror_set *ms)
{
struct dm_region *reg;
......@@ -398,6 +411,7 @@ static void do_recovery(struct mirror_set *ms)
/* the sync is complete */
dm_table_event(ms->ti->table);
ms->in_sync = 1;
reset_ms_flags(ms);
}
}
......@@ -759,7 +773,7 @@ static void do_writes(struct mirror_set *ms, struct bio_list *writes)
dm_rh_delay(ms->rh, bio);
while ((bio = bio_list_pop(&nosync))) {
if (unlikely(ms->leg_failure) && errors_handled(ms)) {
if (unlikely(ms->leg_failure) && errors_handled(ms) && !keep_log(ms)) {
spin_lock_irq(&ms->lock);
bio_list_add(&ms->failures, bio);
spin_unlock_irq(&ms->lock);
......@@ -803,15 +817,21 @@ static void do_failures(struct mirror_set *ms, struct bio_list *failures)
/*
* If all the legs are dead, fail the I/O.
* If we have been told to handle errors, hold the bio
* and wait for userspace to deal with the problem.
* If the device has failed and keep_log is enabled,
* fail the I/O.
*
* If we have been told to handle errors, and keep_log
* isn't enabled, hold the bio and wait for userspace to
* deal with the problem.
*
* Otherwise pretend that the I/O succeeded. (This would
* be wrong if the failed leg returned after reboot and
* got replicated back to the good legs.)
*/
if (!get_valid_mirror(ms))
if (unlikely(!get_valid_mirror(ms) || (keep_log(ms) && ms->log_failure)))
bio_endio(bio, -EIO);
else if (errors_handled(ms))
else if (errors_handled(ms) && !keep_log(ms))
hold_bio(ms, bio);
else
bio_endio(bio, 0);
......@@ -987,6 +1007,7 @@ static int parse_features(struct mirror_set *ms, unsigned argc, char **argv,
unsigned num_features;
struct dm_target *ti = ms->ti;
char dummy;
int i;
*args_used = 0;
......@@ -1007,14 +1028,24 @@ static int parse_features(struct mirror_set *ms, unsigned argc, char **argv,
return -EINVAL;
}
for (i = 0; i < num_features; i++) {
if (!strcmp("handle_errors", argv[0]))
ms->features |= DM_RAID1_HANDLE_ERRORS;
else if (!strcmp("keep_log", argv[0]))
ms->features |= DM_RAID1_KEEP_LOG;
else {
ti->error = "Unrecognised feature requested";
return -EINVAL;
}
argc--;
argv++;
(*args_used)++;
}
if (!errors_handled(ms) && keep_log(ms)) {
ti->error = "keep_log feature requires the handle_errors feature";
return -EINVAL;
}
return 0;
}
......@@ -1029,7 +1060,7 @@ static int parse_features(struct mirror_set *ms, unsigned argc, char **argv,
* log_type is "core" or "disk"
* #log_params is between 1 and 3
*
* If present, features must be "handle_errors".
* If present, supported features are "handle_errors" and "keep_log".
*/
static int mirror_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
......@@ -1363,6 +1394,7 @@ static void mirror_status(struct dm_target *ti, status_type_t type,
unsigned status_flags, char *result, unsigned maxlen)
{
unsigned int m, sz = 0;
int num_feature_args = 0;
struct mirror_set *ms = (struct mirror_set *) ti->private;
struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh);
char buffer[ms->nr_mirrors + 1];
......@@ -1392,8 +1424,17 @@ static void mirror_status(struct dm_target *ti, status_type_t type,
DMEMIT(" %s %llu", ms->mirror[m].dev->name,
(unsigned long long)ms->mirror[m].offset);
if (ms->features & DM_RAID1_HANDLE_ERRORS)
DMEMIT(" 1 handle_errors");
num_feature_args += !!errors_handled(ms);
num_feature_args += !!keep_log(ms);
if (num_feature_args) {
DMEMIT(" %d", num_feature_args);
if (errors_handled(ms))
DMEMIT(" handle_errors");
if (keep_log(ms))
DMEMIT(" keep_log");
}
break;
}
}
......@@ -1413,7 +1454,7 @@ static int mirror_iterate_devices(struct dm_target *ti,
static struct target_type mirror_target = {
.name = "mirror",
.version = {1, 13, 2},
.version = {1, 14, 0},
.module = THIS_MODULE,
.ctr = mirror_ctr,
.dtr = mirror_dtr,
......
This diff is collapsed.
......@@ -18,6 +18,7 @@ struct dm_stats {
struct dm_stats_aux {
bool merged;
unsigned long long duration_ns;
};
void dm_stats_init(struct dm_stats *st);
......@@ -30,7 +31,8 @@ int dm_stats_message(struct mapped_device *md, unsigned argc, char **argv,
void dm_stats_account_io(struct dm_stats *stats, unsigned long bi_rw,
sector_t bi_sector, unsigned bi_sectors, bool end,
unsigned long duration, struct dm_stats_aux *aux);
unsigned long duration_jiffies,
struct dm_stats_aux *aux);
static inline bool dm_stats_used(struct dm_stats *st)
{
......
......@@ -451,10 +451,8 @@ int __init dm_stripe_init(void)
int r;
r = dm_register_target(&stripe_target);
if (r < 0) {
if (r < 0)
DMWARN("target registration failed");
return r;
}
return r;
}
......
......@@ -964,8 +964,8 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
return -EINVAL;
}
if (!t->mempools)
return -ENOMEM;
if (IS_ERR(t->mempools))
return PTR_ERR(t->mempools);
return 0;
}
......
......@@ -184,7 +184,6 @@ struct dm_pool_metadata {
uint64_t trans_id;
unsigned long flags;
sector_t data_block_size;
bool read_only:1;
/*
* Set if a transaction has to be aborted but the attempt to roll back
......@@ -836,7 +835,6 @@ struct dm_pool_metadata *dm_pool_metadata_open(struct block_device *bdev,
init_rwsem(&pmd->root_lock);
pmd->time = 0;
INIT_LIST_HEAD(&pmd->thin_devices);
pmd->read_only = false;
pmd->fail_io = false;
pmd->bdev = bdev;
pmd->data_block_size = data_block_size;
......@@ -880,7 +878,7 @@ int dm_pool_metadata_close(struct dm_pool_metadata *pmd)
return -EBUSY;
}
if (!pmd->read_only && !pmd->fail_io) {
if (!dm_bm_is_read_only(pmd->bm) && !pmd->fail_io) {
r = __commit_transaction(pmd);
if (r < 0)
DMWARN("%s: __commit_transaction() failed, error = %d",
......@@ -1392,10 +1390,11 @@ int dm_thin_find_block(struct dm_thin_device *td, dm_block_t block,
dm_block_t keys[2] = { td->id, block };
struct dm_btree_info *info;
if (pmd->fail_io)
return -EINVAL;
down_read(&pmd->root_lock);
if (pmd->fail_io) {
up_read(&pmd->root_lock);
return -EINVAL;
}
if (can_issue_io) {
info = &pmd->info;
......@@ -1419,6 +1418,63 @@ int dm_thin_find_block(struct dm_thin_device *td, dm_block_t block,
return r;
}
/* FIXME: write a more efficient one in btree */
int dm_thin_find_mapped_range(struct dm_thin_device *td,
dm_block_t begin, dm_block_t end,
dm_block_t *thin_begin, dm_block_t *thin_end,
dm_block_t *pool_begin, bool *maybe_shared)
{
int r;
dm_block_t pool_end;
struct dm_thin_lookup_result lookup;
if (end < begin)
return -ENODATA;
/*
* Find first mapped block.
*/
while (begin < end) {
r = dm_thin_find_block(td, begin, true, &lookup);
if (r) {
if (r != -ENODATA)
return r;
} else
break;
begin++;
}
if (begin == end)
return -ENODATA;
*thin_begin = begin;
*pool_begin = lookup.block;
*maybe_shared = lookup.shared;
begin++;
pool_end = *pool_begin + 1;
while (begin != end) {
r = dm_thin_find_block(td, begin, true, &lookup);
if (r) {
if (r == -ENODATA)
break;
else
return r;
}
if ((lookup.block != pool_end) ||
(lookup.shared != *maybe_shared))
break;
pool_end++;
begin++;
}
*thin_end = begin;
return 0;
}
static int __insert(struct dm_thin_device *td, dm_block_t block,
dm_block_t data_block)
{
......@@ -1471,6 +1527,47 @@ static int __remove(struct dm_thin_device *td, dm_block_t block)
return 0;
}
static int __remove_range(struct dm_thin_device *td, dm_block_t begin, dm_block_t end)
{
int r;
unsigned count;
struct dm_pool_metadata *pmd = td->pmd;
dm_block_t keys[1] = { td->id };
__le64 value;
dm_block_t mapping_root;
/*
* Find the mapping tree
*/
r = dm_btree_lookup(&pmd->tl_info, pmd->root, keys, &value);
if (r)
return r;
/*
* Remove from the mapping tree, taking care to inc the
* ref count so it doesn't get deleted.
*/
mapping_root = le64_to_cpu(value);
dm_tm_inc(pmd->tm, mapping_root);
r = dm_btree_remove(&pmd->tl_info, pmd->root, keys, &pmd->root);
if (r)
return r;
r = dm_btree_remove_leaves(&pmd->bl_info, mapping_root, &begin, end, &mapping_root, &count);
if (r)
return r;
td->mapped_blocks -= count;
td->changed = 1;
/*
* Reinsert the mapping tree.
*/
value = cpu_to_le64(mapping_root);
__dm_bless_for_disk(&value);
return dm_btree_insert(&pmd->tl_info, pmd->root, keys, &value, &pmd->root);
}
int dm_thin_remove_block(struct dm_thin_device *td, dm_block_t block)
{
int r = -EINVAL;
......@@ -1483,6 +1580,19 @@ int dm_thin_remove_block(struct dm_thin_device *td, dm_block_t block)
return r;
}
int dm_thin_remove_range(struct dm_thin_device *td,
dm_block_t begin, dm_block_t end)
{
int r = -EINVAL;
down_write(&td->pmd->root_lock);
if (!td->pmd->fail_io)
r = __remove_range(td, begin, end);
up_write(&td->pmd->root_lock);
return r;
}
int dm_pool_block_is_used(struct dm_pool_metadata *pmd, dm_block_t b, bool *result)
{
int r;
......@@ -1739,7 +1849,6 @@ int dm_pool_resize_metadata_dev(struct dm_pool_metadata *pmd, dm_block_t new_cou
void dm_pool_metadata_read_only(struct dm_pool_metadata *pmd)
{
down_write(&pmd->root_lock);
pmd->read_only = true;
dm_bm_set_read_only(pmd->bm);
up_write(&pmd->root_lock);
}
......@@ -1747,7 +1856,6 @@ void dm_pool_metadata_read_only(struct dm_pool_metadata *pmd)
void dm_pool_metadata_read_write(struct dm_pool_metadata *pmd)
{
down_write(&pmd->root_lock);
pmd->read_only = false;
dm_bm_set_read_write(pmd->bm);
up_write(&pmd->root_lock);
}
......
......@@ -146,6 +146,15 @@ struct dm_thin_lookup_result {
int dm_thin_find_block(struct dm_thin_device *td, dm_block_t block,
int can_issue_io, struct dm_thin_lookup_result *result);
/*
* Retrieve the next run of contiguously mapped blocks. Useful for working
* out where to break up IO. Returns 0 on success, < 0 on error.
*/
int dm_thin_find_mapped_range(struct dm_thin_device *td,
dm_block_t begin, dm_block_t end,
dm_block_t *thin_begin, dm_block_t *thin_end,
dm_block_t *pool_begin, bool *maybe_shared);
/*
* Obtain an unused block.
*/
......@@ -158,6 +167,8 @@ int dm_thin_insert_block(struct dm_thin_device *td, dm_block_t block,
dm_block_t data_block);
int dm_thin_remove_block(struct dm_thin_device *td, dm_block_t block);
int dm_thin_remove_range(struct dm_thin_device *td,
dm_block_t begin, dm_block_t end);
/*
* Queries.
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment