Commit 0be600a5 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-4.16/dm-changes' of...

Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md->queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of <highest mapped sector> when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
parents 040639b7 9614e2ba
......@@ -60,7 +60,7 @@ Memory usage:
The mq policy used a lot of memory; 88 bytes per cache block on a 64
bit machine.
smq uses 28bit indexes to implement it's data structures rather than
smq uses 28bit indexes to implement its data structures rather than
pointers. It avoids storing an explicit hit count for each block. It
has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
the entries (each hotspot block covers a larger area than a single
......@@ -84,7 +84,7 @@ resulting in better promotion/demotion decisions.
Adaptability:
The mq policy maintained a hit count for each cache block. For a
different block to get promoted to the cache it's hit count has to
different block to get promoted to the cache its hit count has to
exceed the lowest currently in the cache. This meant it could take a
long time for the cache to adapt between varying IO patterns.
......
......@@ -59,7 +59,7 @@ Fixed block size
The origin is divided up into blocks of a fixed size. This block size
is configurable when you first create the cache. Typically we've been
using block sizes of 256KB - 1024KB. The block size must be between 64
(32KB) and 2097152 (1GB) and a multiple of 64 (32KB).
sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
Having a fixed block size simplifies the target a lot. But it is
something of a compromise. For instance, a small part of a block may be
......@@ -119,7 +119,7 @@ doing here to avoid migrating during those peak io moments.
For the time being, a message "migration_threshold <#sectors>"
can be used to set the maximum number of sectors being migrated,
the default being 204800 sectors (or 100MB).
the default being 2048 sectors (1MB).
Updating on-disk metadata
-------------------------
......@@ -143,11 +143,6 @@ the policy how big this chunk is, but it should be kept small. Like the
dirty flags this data is lost if there's a crash so a safe fallback
value should always be possible.
For instance, the 'mq' policy, which is currently the default policy,
uses this facility to store the hit count of the cache blocks. If
there's a crash this information will be lost, which means the cache
may be less efficient until those hit counts are regenerated.
Policy hints affect performance, not correctness.
Policy messaging
......
......@@ -343,5 +343,8 @@ Version History
1.11.0 Fix table line argument order
(wrong raid10_copies/raid10_format sequence)
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
1.12.1 fix for MD deadlock between mddev_suspend() and md_write_start() available
1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
state races.
1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
......@@ -49,6 +49,10 @@ The difference between persistent and transient is with transient
snapshots less metadata must be saved on disk - they can be kept in
memory by the kernel.
When loading or unloading the snapshot target, the corresponding
snapshot-origin or snapshot-merge target must be suspended. A failure to
suspend the origin target could result in data corruption.
* snapshot-merge <origin> <COW device> <persistent> <chunksize>
......
......@@ -112,9 +112,11 @@ $low_water_mark is expressed in blocks of size $data_block_size. If
free space on the data device drops below this level then a dm event
will be triggered which a userspace daemon should catch allowing it to
extend the pool device. Only one such event will be sent.
Resuming a device with a new table itself triggers an event so the
userspace daemon can use this to detect a situation where a new table
already exceeds the threshold.
No special event is triggered if a just resumed device's free space is below
the low water mark. However, resuming a device always triggers an
event; a userspace daemon should verify that free space exceeds the low
water mark when handling this event.
A low water mark for the metadata device is maintained in the kernel and
will trigger a dm event if free space on the metadata device drops below
......@@ -274,7 +276,8 @@ ii) Status
<transaction id> <used metadata blocks>/<total metadata blocks>
<used data blocks>/<total data blocks> <held metadata root>
[no_]discard_passdown ro|rw
ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
needs_check|-
transaction id:
A 64-bit number used by userspace to help synchronise with metadata
......@@ -394,3 +397,6 @@ ii) Status
If the pool has encountered device errors and failed, the status
will just contain the string 'Fail'. The userspace recovery
tools should then be used.
In the case where <nr mapped sectors> is 0, there is no highest
mapped sector and the value of <highest mapped sector> is unspecified.
Introduction
============
The device-mapper "unstriped" target provides a transparent mechanism to
unstripe a device-mapper "striped" target to access the underlying disks
without having to touch the true backing block-device. It can also be
used to unstripe a hardware RAID-0 to access backing disks.
Parameters:
<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
<number of stripes>
The number of stripes in the RAID 0.
<chunk size>
The amount of 512B sectors in the chunk striping.
<dev_path>
The block device you wish to unstripe.
<stripe #>
The stripe number within the device that corresponds to physical
drive you wish to unstripe. This must be 0 indexed.
Why use this module?
====================
An example of undoing an existing dm-stripe
-------------------------------------------
This small bash script will setup 4 loop devices and use the existing
striped target to combine the 4 devices into one. It then will use
the unstriped target ontop of the striped device to access the
individual backing loop devices. We write data to the newly exposed
unstriped devices and verify the data written matches the correct
underlying device on the striped array.
#!/bin/bash
MEMBER_SIZE=$((128 * 1024 * 1024))
NUM=4
SEQ_END=$((${NUM}-1))
CHUNK=256
BS=4096
RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
COUNT=$((${MEMBER_SIZE} / ${BS}))
for i in $(seq 0 ${SEQ_END}); do
dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
losetup /dev/loop${i} member-${i}
DM_PARMS+=" /dev/loop${i} 0"
done
echo $DM_PARMS | dmsetup create raid0
for i in $(seq 0 ${SEQ_END}); do
echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
done;
for i in $(seq 0 ${SEQ_END}); do
dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
diff /dev/mapper/set-${i} member-${i}
done;
for i in $(seq 0 ${SEQ_END}); do
dmsetup remove set-${i}
done
dmsetup remove raid0
for i in $(seq 0 ${SEQ_END}); do
losetup -d /dev/loop${i}
rm -f member-${i}
done
Another example
---------------
Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k chunk on each core, resulting
in a 256k stripe across the two cores:
Core 0: Core 1:
__________ __________
| LBA 512| | LBA 768|
| LBA 0 | | LBA 256|
---------- ----------
The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. This is because the partitions
are striped across the two cores. When we unstripe this hardware RAID 0
and make partitions on each new exposed device the two partitions are now
physically separated.
With the dm-unstriped target we're able to segregate an fio script that
has read and write jobs that are independent of each other. Compared to
when we run the test on a combined drive with partitions, we were able
to get a 92% reduction in read latency using this device mapper target.
Example dmsetup usage
=====================
unstriped ontop of Intel NVMe device that has 2 cores
-----------------------------------------------------
dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
There will now be two devices that expose Intel NVMe core 0 and 1
respectively:
/dev/mapper/nvmset0
/dev/mapper/nvmset1
unstriped ontop of striped with 4 drives using 128K chunk size
--------------------------------------------------------------
dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
......@@ -269,6 +269,13 @@ config DM_BIO_PRISON
source "drivers/md/persistent-data/Kconfig"
config DM_UNSTRIPED
tristate "Unstriped target"
depends on BLK_DEV_DM
---help---
Unstripes I/O so it is issued solely on a single drive in a HW
RAID0 or dm-striped target.
config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
......
......@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE) += bcache/
obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
obj-$(CONFIG_DM_UNSTRIPED) += dm-unstripe.o
obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
......
......@@ -662,7 +662,7 @@ static void submit_io(struct dm_buffer *b, int rw, bio_end_io_t *end_io)
sector = (b->block << b->c->sectors_per_block_bits) + b->c->start;
if (rw != WRITE) {
if (rw != REQ_OP_WRITE) {
n_sectors = 1 << b->c->sectors_per_block_bits;
offset = 0;
} else {
......@@ -740,7 +740,7 @@ static void __write_dirty_buffer(struct dm_buffer *b,
b->write_end = b->dirty_end;
if (!write_list)
submit_io(b, WRITE, write_endio);
submit_io(b, REQ_OP_WRITE, write_endio);
else
list_add_tail(&b->write_list, write_list);
}
......@@ -753,7 +753,7 @@ static void __flush_write_list(struct list_head *write_list)
struct dm_buffer *b =
list_entry(write_list->next, struct dm_buffer, write_list);
list_del(&b->write_list);
submit_io(b, WRITE, write_endio);
submit_io(b, REQ_OP_WRITE, write_endio);
cond_resched();
}
blk_finish_plug(&plug);
......@@ -1123,7 +1123,7 @@ static void *new_read(struct dm_bufio_client *c, sector_t block,
return NULL;
if (need_submit)
submit_io(b, READ, read_endio);
submit_io(b, REQ_OP_READ, read_endio);
wait_on_bit_io(&b->state, B_READING, TASK_UNINTERRUPTIBLE);
......@@ -1193,7 +1193,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
dm_bufio_unlock(c);
if (need_submit)
submit_io(b, READ, read_endio);
submit_io(b, REQ_OP_READ, read_endio);
dm_bufio_release(b);
cond_resched();
......@@ -1454,7 +1454,7 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block)
old_block = b->block;
__unlink_buffer(b);
__link_buffer(b, new_block, b->list_mode);
submit_io(b, WRITE, write_endio);
submit_io(b, REQ_OP_WRITE, write_endio);
wait_on_bit_io(&b->state, B_WRITING,
TASK_UNINTERRUPTIBLE);
__unlink_buffer(b);
......@@ -1716,7 +1716,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
if (!DM_BUFIO_CACHE_NAME(c)) {
r = -ENOMEM;
mutex_unlock(&dm_bufio_clients_lock);
goto bad_cache;
goto bad;
}
}
......@@ -1727,7 +1727,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
if (!DM_BUFIO_CACHE(c)) {
r = -ENOMEM;
mutex_unlock(&dm_bufio_clients_lock);
goto bad_cache;
goto bad;
}
}
}
......@@ -1738,27 +1738,28 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
if (!b) {
r = -ENOMEM;
goto bad_buffer;
goto bad;
}
__free_buffer_wake(b);
}
c->shrinker.count_objects = dm_bufio_shrink_count;
c->shrinker.scan_objects = dm_bufio_shrink_scan;
c->shrinker.seeks = 1;
c->shrinker.batch = 0;
r = register_shrinker(&c->shrinker);
if (r)
goto bad;
mutex_lock(&dm_bufio_clients_lock);
dm_bufio_client_count++;
list_add(&c->client_list, &dm_bufio_all_clients);
__cache_size_refresh();
mutex_unlock(&dm_bufio_clients_lock);
c->shrinker.count_objects = dm_bufio_shrink_count;
c->shrinker.scan_objects = dm_bufio_shrink_scan;
c->shrinker.seeks = 1;
c->shrinker.batch = 0;
register_shrinker(&c->shrinker);
return c;
bad_buffer:
bad_cache:
bad:
while (!list_empty(&c->reserved_buffers)) {
struct dm_buffer *b = list_entry(c->reserved_buffers.next,
struct dm_buffer, lru_list);
......@@ -1767,6 +1768,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
}
dm_io_client_destroy(c->dm_io);
bad_dm_io:
mutex_destroy(&c->lock);
kfree(c);
bad_client:
return ERR_PTR(r);
......@@ -1811,6 +1813,7 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)
BUG_ON(c->n_buffers[i]);
dm_io_client_destroy(c->dm_io);
mutex_destroy(&c->lock);
kfree(c);
}
EXPORT_SYMBOL_GPL(dm_bufio_client_destroy);
......
......@@ -91,8 +91,7 @@ struct mapped_device {
/*
* io objects are allocated from here.
*/
mempool_t *io_pool;
struct bio_set *io_bs;
struct bio_set *bs;
/*
......@@ -130,8 +129,6 @@ struct mapped_device {
struct srcu_struct io_barrier;
};
void dm_init_md_queue(struct mapped_device *md);
void dm_init_normal_md_queue(struct mapped_device *md);
int md_in_flight(struct mapped_device *md);
void disable_write_same(struct mapped_device *md);
void disable_write_zeroes(struct mapped_device *md);
......
......@@ -2193,6 +2193,8 @@ static void crypt_dtr(struct dm_target *ti)
kzfree(cc->cipher_auth);
kzfree(cc->authenc_key);
mutex_destroy(&cc->bio_alloc_lock);
/* Must zero key material before freeing */
kzfree(cc);
}
......@@ -2702,8 +2704,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad;
}
cc->bs = bioset_create(MIN_IOS, 0, (BIOSET_NEED_BVECS |
BIOSET_NEED_RESCUER));
cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS);
if (!cc->bs) {
ti->error = "Cannot allocate crypt bioset";
goto bad;
......
......@@ -229,6 +229,8 @@ static void delay_dtr(struct dm_target *ti)
if (dc->dev_write)
dm_put_device(ti, dc->dev_write);
mutex_destroy(&dc->timer_lock);
kfree(dc);
}
......
......@@ -70,6 +70,11 @@ static int parse_features(struct dm_arg_set *as, struct flakey_c *fc,
arg_name = dm_shift_arg(as);
argc--;
if (!arg_name) {
ti->error = "Insufficient feature arguments";
return -EINVAL;
}
/*
* drop_writes
*/
......
......@@ -58,8 +58,7 @@ struct dm_io_client *dm_io_client_create(void)
if (!client->pool)
goto bad;
client->bios = bioset_create(min_ios, 0, (BIOSET_NEED_BVECS |
BIOSET_NEED_RESCUER));
client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS);
if (!client->bios)
goto bad;
......
......@@ -477,8 +477,10 @@ static int run_complete_job(struct kcopyd_job *job)
* If this is the master job, the sub jobs have already
* completed so we can free everything.
*/
if (job->master_job == job)
if (job->master_job == job) {
mutex_destroy(&job->lock);
mempool_free(job, kc->job_pool);
}
fn(read_err, write_err, context);
if (atomic_dec_and_test(&kc->nr_jobs))
......@@ -750,6 +752,7 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
* followed by SPLIT_COUNT sub jobs.
*/
job = mempool_alloc(kc->job_pool, GFP_NOIO);
mutex_init(&job->lock);
/*
* set up for the read.
......@@ -811,7 +814,6 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
if (job->source.count <= SUB_JOB_SIZE)
dispatch_job(job);
else {
mutex_init(&job->lock);
job->progress = 0;
split_job(job);
}
......
......@@ -594,7 +594,7 @@ static int log_mark(struct log_writes_c *lc, char *data)
return -ENOMEM;
}
block->data = kstrndup(data, maxsize, GFP_KERNEL);
block->data = kstrndup(data, maxsize - 1, GFP_KERNEL);
if (!block->data) {
DMERR("Error copying mark data");
kfree(block);
......
This diff is collapsed.
......@@ -195,9 +195,6 @@ static struct dm_path *ql_select_path(struct path_selector *ps, size_t nr_bytes)
if (list_empty(&s->valid_paths))
goto out;
/* Change preferred (first in list) path to evenly balance. */
list_move_tail(s->valid_paths.next, &s->valid_paths);
list_for_each_entry(pi, &s->valid_paths, list) {
if (!best ||
(atomic_read(&pi->qlen) < atomic_read(&best->qlen)))
......@@ -210,6 +207,9 @@ static struct dm_path *ql_select_path(struct path_selector *ps, size_t nr_bytes)
if (!best)
goto out;
/* Move most recently used to least preferred to evenly balance. */
list_move_tail(&best->list, &s->valid_paths);
ret = best->path;
out:
spin_unlock_irqrestore(&s->lock, flags);
......
This diff is collapsed.
......@@ -315,6 +315,10 @@ static void dm_done(struct request *clone, blk_status_t error, bool mapped)
/* The target wants to requeue the I/O */
dm_requeue_original_request(tio, false);
break;
case DM_ENDIO_DELAY_REQUEUE:
/* The target wants to requeue the I/O after a delay */
dm_requeue_original_request(tio, true);
break;
default:
DMWARN("unimplemented target endio return value: %d", r);
BUG();
......@@ -713,7 +717,6 @@ int dm_old_init_request_queue(struct mapped_device *md, struct dm_table *t)
/* disable dm_old_request_fn's merge heuristic by default */
md->seq_rq_merge_deadline_usecs = 0;
dm_init_normal_md_queue(md);
blk_queue_softirq_done(md->queue, dm_softirq_done);
/* Initialize the request-based DM worker thread */
......@@ -821,7 +824,6 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
err = PTR_ERR(q);
goto out_tag_set;
}
dm_init_md_queue(md);
return 0;
......
......@@ -282,9 +282,6 @@ static struct dm_path *st_select_path(struct path_selector *ps, size_t nr_bytes)
if (list_empty(&s->valid_paths))
goto out;
/* Change preferred (first in list) path to evenly balance. */
list_move_tail(s->valid_paths.next, &s->valid_paths);
list_for_each_entry(pi, &s->valid_paths, list)
if (!best || (st_compare_load(pi, best, nr_bytes) < 0))
best = pi;
......@@ -292,6 +289,9 @@ static struct dm_path *st_select_path(struct path_selector *ps, size_t nr_bytes)
if (!best)
goto out;
/* Move most recently used to least preferred to evenly balance. */
list_move_tail(&best->list, &s->valid_paths);
ret = best->path;
out:
spin_unlock_irqrestore(&s->lock, flags);
......
......@@ -47,7 +47,7 @@ struct dm_exception_table {
};
struct dm_snapshot {
struct rw_semaphore lock;
struct mutex lock;
struct dm_dev *origin;
struct dm_dev *cow;
......@@ -439,9 +439,9 @@ static int __find_snapshots_sharing_cow(struct dm_snapshot *snap,
if (!bdev_equal(s->cow->bdev, snap->cow->bdev))
continue;
down_read(&s->lock);
mutex_lock(&s->lock);
active = s->active;
up_read(&s->lock);
mutex_unlock(&s->lock);
if (active) {
if (snap_src)
......@@ -909,7 +909,7 @@ static int remove_single_exception_chunk(struct dm_snapshot *s)
int r;
chunk_t old_chunk = s->first_merging_chunk + s->num_merging_chunks - 1;
down_write(&s->lock);
mutex_lock(&s->lock);
/*
* Process chunks (and associated exceptions) in reverse order
......@@ -924,7 +924,7 @@ static int remove_single_exception_chunk(struct dm_snapshot *s)
b = __release_queued_bios_after_merge(s);
out:
up_write(&s->lock);
mutex_unlock(&s->lock);
if (b)
flush_bios(b);
......@@ -983,9 +983,9 @@ static void snapshot_merge_next_chunks(struct dm_snapshot *s)
if (linear_chunks < 0) {
DMERR("Read error in exception store: "
"shutting down merge");
down_write(&s->lock);
mutex_lock(&s->lock);
s->merge_failed = 1;
up_write(&s->lock);
mutex_unlock(&s->lock);
}
goto shut;
}
......@@ -1026,10 +1026,10 @@ static void snapshot_merge_next_chunks(struct dm_snapshot *s)
previous_count = read_pending_exceptions_done_count();
}
down_write(&s->lock);
mutex_lock(&s->lock);
s->first_merging_chunk = old_chunk;
s->num_merging_chunks = linear_chunks;
up_write(&s->lock);
mutex_unlock(&s->lock);
/* Wait until writes to all 'linear_chunks' drain */
for (i = 0; i < linear_chunks; i++)
......@@ -1071,10 +1071,10 @@ static void merge_callback(int read_err, unsigned long write_err, void *context)
return;
shut:
down_write(&s->lock);
mutex_lock(&s->lock);
s->merge_failed = 1;
b = __release_queued_bios_after_merge(s);
up_write(&s->lock);
mutex_unlock(&s->lock);
error_bios(b);
merge_shutdown(s);
......@@ -1173,7 +1173,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
s->exception_start_sequence = 0;
s->exception_complete_sequence = 0;
INIT_LIST_HEAD(&s->out_of_order_list);
init_rwsem(&s->lock);
mutex_init(&s->lock);
INIT_LIST_HEAD(&s->list);
spin_lock_init(&s->pe_lock);
s->state_bits = 0;
......@@ -1338,9 +1338,9 @@ static void snapshot_dtr(struct dm_target *ti)
/* Check whether exception handover must be cancelled */
(void) __find_snapshots_sharing_cow(s, &snap_src, &snap_dest, NULL);
if (snap_src && snap_dest && (s == snap_src)) {
down_write(&snap_dest->lock);
mutex_lock(&snap_dest->lock);
snap_dest->valid = 0;
up_write(&snap_dest->lock);
mutex_unlock(&snap_dest->lock);
DMERR("Cancelling snapshot handover.");
}
up_read(&_origins_lock);
......@@ -1371,6 +1371,8 @@ static void snapshot_dtr(struct dm_target *ti)
dm_exception_store_destroy(s->store);
mutex_destroy(&s->lock);
dm_put_device(ti, s->cow);
dm_put_device(ti, s->origin);
......@@ -1458,7 +1460,7 @@ static void pending_complete(void *context, int success)
if (!success) {
/* Read/write error - snapshot is unusable */
down_write(&s->lock);
mutex_lock(&s->lock);
__invalidate_snapshot(s, -EIO);
error = 1;
goto out;
......@@ -1466,14 +1468,14 @@ static void pending_complete(void *context, int success)
e = alloc_completed_exception(GFP_NOIO);
if (!e) {
down_write(&s->lock);
mutex_lock(&s->lock);
__invalidate_snapshot(s, -ENOMEM);
error = 1;
goto out;
}
*e = pe->e;
down_write(&s->lock);
mutex_lock(&s->lock);
if (!s->valid) {
free_completed_exception(e);
error = 1;
......@@ -1498,7 +1500,7 @@ static void pending_complete(void *context, int success)
full_bio->bi_end_io = pe->full_bio_end_io;
increment_pending_exceptions_done_count();
up_write(&s->lock);
mutex_unlock(&s->lock);
/* Submit any pending write bios */
if (error) {
......@@ -1694,7 +1696,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
/* FIXME: should only take write lock if we need
* to copy an exception */
down_write(&s->lock);
mutex_lock(&s->lock);
if (!s->valid || (unlikely(s->snapshot_overflowed) &&
bio_data_dir(bio) == WRITE)) {
......@@ -1717,9 +1719,9 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
if (bio_data_dir(bio) == WRITE) {
pe = __lookup_pending_exception(s, chunk);
if (!pe) {
up_write(&s->lock);
mutex_unlock(&s->lock);
pe = alloc_pending_exception(s);
down_write(&s->lock);
mutex_lock(&s->lock);
if (!s->valid || s->snapshot_overflowed) {
free_pending_exception(pe);
......@@ -1754,7 +1756,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
bio->bi_iter.bi_size ==
(s->store->chunk_size << SECTOR_SHIFT)) {
pe->started = 1;
up_write(&s->lock);
mutex_unlock(&s->lock);
start_full_bio(pe, bio);
goto out;
}
......@@ -1764,7 +1766,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
if (!pe->started) {
/* this is protected by snap->lock */
pe->started = 1;
up_write(&s->lock);
mutex_unlock(&s->lock);
start_copy(pe);
goto out;
}
......@@ -1774,7 +1776,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
}
out_unlock:
up_write(&s->lock);
mutex_unlock(&s->lock);
out:
return r;
}
......@@ -1810,7 +1812,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
chunk = sector_to_chunk(s->store, bio->bi_iter.bi_sector);
down_write(&s->lock);
mutex_lock(&s->lock);
/* Full merging snapshots are redirected to the origin */
if (!s->valid)
......@@ -1841,12 +1843,12 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
bio_set_dev(bio, s->origin->bdev);
if (bio_data_dir(bio) == WRITE) {
up_write(&s->lock);
mutex_unlock(&s->lock);
return do_origin(s->origin, bio);
}
out_unlock:
up_write(&s->lock);
mutex_unlock(&s->lock);
return r;
}
......@@ -1878,7 +1880,7 @@ static int snapshot_preresume(struct dm_target *ti)
down_read(&_origins_lock);
(void) __find_snapshots_sharing_cow(s, &snap_src, &snap_dest, NULL);
if (snap_src && snap_dest) {
down_read(&snap_src->lock);
mutex_lock(&snap_src->lock);
if (s == snap_src) {
DMERR("Unable to resume snapshot source until "
"handover completes.");
......@@ -1888,7 +1890,7 @@ static int snapshot_preresume(struct dm_target *ti)
"source is suspended.");
r = -EINVAL;
}
up_read(&snap_src->lock);
mutex_unlock(&snap_src->lock);
}
up_read(&_origins_lock);
......@@ -1934,11 +1936,11 @@ static void snapshot_resume(struct dm_target *ti)
(void) __find_snapshots_sharing_cow(s, &snap_src, &snap_dest, NULL);
if (snap_src && snap_dest) {
down_write(&snap_src->lock);
down_write_nested(&snap_dest->lock, SINGLE_DEPTH_NESTING);
mutex_lock(&snap_src->lock);
mutex_lock_nested(&snap_dest->lock, SINGLE_DEPTH_NESTING);
__handover_exceptions(snap_src, snap_dest);
up_write(&snap_dest->lock);
up_write(&snap_src->lock);
mutex_unlock(&snap_dest->lock);
mutex_unlock(&snap_src->lock);
}
up_read(&_origins_lock);
......@@ -1953,9 +1955,9 @@ static void snapshot_resume(struct dm_target *ti)
/* Now we have correct chunk size, reregister */
reregister_snapshot(s);
down_write(&s->lock);
mutex_lock(&s->lock);
s->active = 1;
up_write(&s->lock);
mutex_unlock(&s->lock);
}
static uint32_t get_origin_minimum_chunksize(struct block_device *bdev)
......@@ -1995,7 +1997,7 @@ static void snapshot_status(struct dm_target *ti, status_type_t type,
switch (type) {
case STATUSTYPE_INFO:
down_write(&snap->lock);
mutex_lock(&snap->lock);
if (!snap->valid)
DMEMIT("Invalid");
......@@ -2020,7 +2022,7 @@ static void snapshot_status(struct dm_target *ti, status_type_t type,
DMEMIT("Unknown");
}
up_write(&snap->lock);
mutex_unlock(&snap->lock);
break;
......@@ -2086,7 +2088,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
if (dm_target_is_snapshot_merge(snap->ti))
continue;
down_write(&snap->lock);
mutex_lock(&snap->lock);
/* Only deal with valid and active snapshots */
if (!snap->valid || !snap->active)
......@@ -2113,9 +2115,9 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
pe = __lookup_pending_exception(snap, chunk);
if (!pe) {
up_write(&snap->lock);
mutex_unlock(&snap->lock);
pe = alloc_pending_exception(snap);
down_write(&snap->lock);
mutex_lock(&snap->lock);
if (!snap->valid) {
free_pending_exception(pe);
......@@ -2158,7 +2160,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
}
next_snapshot:
up_write(&snap->lock);
mutex_unlock(&snap->lock);
if (pe_to_start_now) {
start_copy(pe_to_start_now);
......
......@@ -228,6 +228,7 @@ void dm_stats_cleanup(struct dm_stats *stats)
dm_stat_free(&s->rcu_head);
}
free_percpu(stats->last);
mutex_destroy(&stats->mutex);
}
static int dm_stats_create(struct dm_stats *stats, sector_t start, sector_t end,
......
......@@ -866,7 +866,8 @@ EXPORT_SYMBOL(dm_consume_args);
static bool __table_type_bio_based(enum dm_queue_mode table_type)
{
return (table_type == DM_TYPE_BIO_BASED ||
table_type == DM_TYPE_DAX_BIO_BASED);
table_type == DM_TYPE_DAX_BIO_BASED ||
table_type == DM_TYPE_NVME_BIO_BASED);
}
static bool __table_type_request_based(enum dm_queue_mode table_type)
......@@ -909,13 +910,33 @@ static bool dm_table_supports_dax(struct dm_table *t)
return true;
}
static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
struct verify_rq_based_data {
unsigned sq_count;
unsigned mq_count;
};
static int device_is_rq_based(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
struct request_queue *q = bdev_get_queue(dev->bdev);
struct verify_rq_based_data *v = data;
if (q->mq_ops)
v->mq_count++;
else
v->sq_count++;
return queue_is_rq_based(q);
}
static int dm_table_determine_type(struct dm_table *t)
{
unsigned i;
unsigned bio_based = 0, request_based = 0, hybrid = 0;
unsigned sq_count = 0, mq_count = 0;
struct verify_rq_based_data v = {.sq_count = 0, .mq_count = 0};
struct dm_target *tgt;
struct dm_dev_internal *dd;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
......@@ -923,6 +944,14 @@ static int dm_table_determine_type(struct dm_table *t)
/* target already set the table's type */
if (t->type == DM_TYPE_BIO_BASED)
return 0;
else if (t->type == DM_TYPE_NVME_BIO_BASED) {
if (!dm_table_does_not_support_partial_completion(t)) {
DMERR("nvme bio-based is only possible with devices"
" that don't support partial completion");
return -EINVAL;
}
/* Fallthru, also verify all devices are blk-mq */
}
BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
goto verify_rq_based;
}
......@@ -937,7 +966,7 @@ static int dm_table_determine_type(struct dm_table *t)
bio_based = 1;
if (bio_based && request_based) {
DMWARN("Inconsistent table: different target types"
DMERR("Inconsistent table: different target types"
" can't be mixed up");
return -EINVAL;
}
......@@ -959,8 +988,18 @@ static int dm_table_determine_type(struct dm_table *t)
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
if (dm_table_supports_dax(t) ||
(list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED))
(list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
/* Check if upgrading to NVMe bio-based is valid or required */
tgt = dm_table_get_immutable_target(t);
if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
t->type = DM_TYPE_NVME_BIO_BASED;
goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
t->type = DM_TYPE_NVME_BIO_BASED;
}
}
return 0;
}
......@@ -980,7 +1019,8 @@ static int dm_table_determine_type(struct dm_table *t)
* (e.g. request completion process for partial completion.)
*/
if (t->num_targets > 1) {
DMWARN("Request-based dm doesn't support multiple targets yet");
DMERR("%s DM doesn't support multiple targets",
t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
return -EINVAL;
}
......@@ -997,28 +1037,29 @@ static int dm_table_determine_type(struct dm_table *t)
return 0;
}
/* Non-request-stackable devices can't be used for request-based dm */
list_for_each_entry(dd, devices, list) {
struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
if (!queue_is_rq_based(q)) {
DMERR("table load rejected: including"
" non-request-stackable devices");
tgt = dm_table_get_immutable_target(t);
if (!tgt) {
DMERR("table load rejected: immutable target is required");
return -EINVAL;
} else if (tgt->max_io_len) {
DMERR("table load rejected: immutable target that splits IO is not supported");
return -EINVAL;
}
if (q->mq_ops)
mq_count++;
else
sq_count++;
/* Non-request-stackable devices can't be used for request-based dm */
if (!tgt->type->iterate_devices ||
!tgt->type->iterate_devices(tgt, device_is_rq_based, &v)) {
DMERR("table load rejected: including non-request-stackable devices");
return -EINVAL;
}
if (sq_count && mq_count) {
if (v.sq_count && v.mq_count) {
DMERR("table load rejected: not all devices are blk-mq request-stackable");
return -EINVAL;
}
t->all_blk_mq = mq_count > 0;
t->all_blk_mq = v.mq_count > 0;
if (t->type == DM_TYPE_MQ_REQUEST_BASED && !t->all_blk_mq) {
if (!t->all_blk_mq &&
(t->type == DM_TYPE_MQ_REQUEST_BASED || t->type == DM_TYPE_NVME_BIO_BASED)) {
DMERR("table load rejected: all devices are not blk-mq request-stackable");
return -EINVAL;
}
......@@ -1079,7 +1120,8 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
{
enum dm_queue_mode type = dm_table_get_type(t);
unsigned per_io_data_size = 0;
struct dm_target *tgt;
unsigned min_pool_size = 0;
struct dm_target *ti;
unsigned i;
if (unlikely(type == DM_TYPE_NONE)) {
......@@ -1089,11 +1131,13 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
if (__table_type_bio_based(type))
for (i = 0; i < t->num_targets; i++) {
tgt = t->targets + i;
per_io_data_size = max(per_io_data_size, tgt->per_io_data_size);
ti = t->targets + i;
per_io_data_size = max(per_io_data_size, ti->per_io_data_size);
min_pool_size = max(min_pool_size, ti->num_flush_bios);
}
t->mempools = dm_alloc_md_mempools(md, type, t->integrity_supported, per_io_data_size);
t->mempools = dm_alloc_md_mempools(md, type, t->integrity_supported,
per_io_data_size, min_pool_size);
if (!t->mempools)
return -ENOMEM;
......@@ -1705,6 +1749,20 @@ static bool dm_table_all_devices_attribute(struct dm_table *t,
return true;
}
static int device_no_partial_completion(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
char b[BDEVNAME_SIZE];
/* For now, NVMe devices are the only devices of this class */
return (strncmp(bdevname(dev->bdev, b), "nvme", 3) == 0);
}
static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
{
return dm_table_all_devices_attribute(t, device_no_partial_completion);
}
static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
......@@ -1820,6 +1878,8 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
if (dm_table_supports_dax(t))
queue_flag_set_unlocked(QUEUE_FLAG_DAX, q);
if (dm_table_supports_dax_write_cache(t))
dax_write_cache(t->md->dax_dev, true);
......
......@@ -492,6 +492,11 @@ static void pool_table_init(void)
INIT_LIST_HEAD(&dm_thin_pool_table.pools);
}
static void pool_table_exit(void)
{
mutex_destroy(&dm_thin_pool_table.mutex);
}
static void __pool_table_insert(struct pool *pool)
{
BUG_ON(!mutex_is_locked(&dm_thin_pool_table.mutex));
......@@ -1717,7 +1722,7 @@ static void __remap_and_issue_shared_cell(void *context,
bio_op(bio) == REQ_OP_DISCARD)
bio_list_add(&info->defer_bios, bio);
else {
struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));;
struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
h->shared_read_entry = dm_deferred_entry_inc(info->tc->pool->shared_read_ds);
inc_all_io_entry(info->tc->pool, bio);
......@@ -4387,6 +4392,8 @@ static void dm_thin_exit(void)
dm_unregister_target(&pool_target);
kmem_cache_destroy(_new_mapping_cache);
pool_table_exit();
}
module_init(dm_thin_init);
......
/*
* Copyright (C) 2017 Intel Corporation.
*
* This file is released under the GPL.
*/
#include "dm.h"
#include <linux/module.h>
#include <linux/init.h>
#include <linux/blkdev.h>
#include <linux/bio.h>
#include <linux/slab.h>
#include <linux/bitops.h>
#include <linux/device-mapper.h>
struct unstripe_c {
struct dm_dev *dev;
sector_t physical_start;
uint32_t stripes;
uint32_t unstripe;
sector_t unstripe_width;
sector_t unstripe_offset;
uint32_t chunk_size;
u8 chunk_shift;
};
#define DM_MSG_PREFIX "unstriped"
static void cleanup_unstripe(struct unstripe_c *uc, struct dm_target *ti)
{
if (uc->dev)
dm_put_device(ti, uc->dev);
kfree(uc);
}
/*
* Contruct an unstriped mapping.
* <number of stripes> <chunk size> <stripe #> <dev_path> <offset>
*/
static int unstripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
struct unstripe_c *uc;
sector_t tmp_len;
unsigned long long start;
char dummy;
if (argc != 5) {
ti->error = "Invalid number of arguments";
return -EINVAL;
}
uc = kzalloc(sizeof(*uc), GFP_KERNEL);
if (!uc) {
ti->error = "Memory allocation for unstriped context failed";
return -ENOMEM;
}
if (kstrtouint(argv[0], 10, &uc->stripes) || !uc->stripes) {
ti->error = "Invalid stripe count";
goto err;
}
if (kstrtouint(argv[1], 10, &uc->chunk_size) || !uc->chunk_size) {
ti->error = "Invalid chunk_size";
goto err;
}
// FIXME: must support non power of 2 chunk_size, dm-stripe.c does
if (!is_power_of_2(uc->chunk_size)) {
ti->error = "Non power of 2 chunk_size is not supported yet";
goto err;
}
if (kstrtouint(argv[2], 10, &uc->unstripe)) {
ti->error = "Invalid stripe number";
goto err;
}
if (uc->unstripe > uc->stripes && uc->stripes > 1) {
ti->error = "Please provide stripe between [0, # of stripes]";
goto err;
}
if (dm_get_device(ti, argv[3], dm_table_get_mode(ti->table), &uc->dev)) {
ti->error = "Couldn't get striped device";
goto err;
}
if (sscanf(argv[4], "%llu%c", &start, &dummy) != 1) {
ti->error = "Invalid striped device offset";
goto err;
}
uc->physical_start = start;
uc->unstripe_offset = uc->unstripe * uc->chunk_size;
uc->unstripe_width = (uc->stripes - 1) * uc->chunk_size;
uc->chunk_shift = fls(uc->chunk_size) - 1;
tmp_len = ti->len;
if (sector_div(tmp_len, uc->chunk_size)) {
ti->error = "Target length not divisible by chunk size";
goto err;
}
if (dm_set_target_max_io_len(ti, uc->chunk_size)) {
ti->error = "Failed to set max io len";
goto err;
}
ti->private = uc;
return 0;
err:
cleanup_unstripe(uc, ti);
return -EINVAL;
}
static void unstripe_dtr(struct dm_target *ti)
{
struct unstripe_c *uc = ti->private;
cleanup_unstripe(uc, ti);
}
static sector_t map_to_core(struct dm_target *ti, struct bio *bio)
{
struct unstripe_c *uc = ti->private;
sector_t sector = bio->bi_iter.bi_sector;
/* Shift us up to the right "row" on the stripe */
sector += uc->unstripe_width * (sector >> uc->chunk_shift);
/* Account for what stripe we're operating on */
sector += uc->unstripe_offset;
return sector;
}
static int unstripe_map(struct dm_target *ti, struct bio *bio)
{
struct unstripe_c *uc = ti->private;
bio_set_dev(bio, uc->dev->bdev);
bio->bi_iter.bi_sector = map_to_core(ti, bio) + uc->physical_start;
return DM_MAPIO_REMAPPED;
}
static void unstripe_status(struct dm_target *ti, status_type_t type,
unsigned int status_flags, char *result, unsigned int maxlen)
{
struct unstripe_c *uc = ti->private;
unsigned int sz = 0;
switch (type) {
case STATUSTYPE_INFO:
break;
case STATUSTYPE_TABLE:
DMEMIT("%d %llu %d %s %llu",
uc->stripes, (unsigned long long)uc->chunk_size, uc->unstripe,
uc->dev->name, (unsigned long long)uc->physical_start);
break;
}
}
static int unstripe_iterate_devices(struct dm_target *ti,
iterate_devices_callout_fn fn, void *data)
{
struct unstripe_c *uc = ti->private;
return fn(ti, uc->dev, uc->physical_start, ti->len, data);
}
static void unstripe_io_hints(struct dm_target *ti,
struct queue_limits *limits)
{
struct unstripe_c *uc = ti->private;
limits->chunk_sectors = uc->chunk_size;
}
static struct target_type unstripe_target = {
.name = "unstriped",
.version = {1, 0, 0},
.module = THIS_MODULE,
.ctr = unstripe_ctr,
.dtr = unstripe_dtr,
.map = unstripe_map,
.status = unstripe_status,
.iterate_devices = unstripe_iterate_devices,
.io_hints = unstripe_io_hints,
};
static int __init dm_unstripe_init(void)
{
int r;
r = dm_register_target(&unstripe_target);
if (r < 0)
DMERR("target registration failed");
return r;
}
static void __exit dm_unstripe_exit(void)
{
dm_unregister_target(&unstripe_target);
}
module_init(dm_unstripe_init);
module_exit(dm_unstripe_exit);
MODULE_DESCRIPTION(DM_NAME " unstriped target");
MODULE_AUTHOR("Scott Bauer <scott.bauer@intel.com>");
MODULE_LICENSE("GPL");
......@@ -2333,6 +2333,9 @@ static void dmz_cleanup_metadata(struct dmz_metadata *zmd)
/* Free the zone descriptors */
dmz_drop_zones(zmd);
mutex_destroy(&zmd->mblk_flush_lock);
mutex_destroy(&zmd->map_lock);
}
/*
......
......@@ -827,6 +827,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
err_cwq:
destroy_workqueue(dmz->chunk_wq);
err_bio:
mutex_destroy(&dmz->chunk_lock);
bioset_free(dmz->bio_set);
err_meta:
dmz_dtr_metadata(dmz->metadata);
......@@ -861,6 +862,8 @@ static void dmz_dtr(struct dm_target *ti)
dmz_put_zoned_device(ti);
mutex_destroy(&dmz->chunk_lock);
kfree(dmz);
}
......
This diff is collapsed.
......@@ -49,7 +49,6 @@ struct dm_md_mempools;
/*-----------------------------------------------------------------
* Internal table functions.
*---------------------------------------------------------------*/
void dm_table_destroy(struct dm_table *t);
void dm_table_event_callback(struct dm_table *t,
void (*fn)(void *), void *context);
struct dm_target *dm_table_get_target(struct dm_table *t, unsigned int index);
......@@ -206,7 +205,8 @@ void dm_kcopyd_exit(void);
* Mempool operations
*/
struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
unsigned integrity, unsigned per_bio_data_size);
unsigned integrity, unsigned per_bio_data_size,
unsigned min_pool_size);
void dm_free_md_mempools(struct dm_md_mempools *pools);
/*
......
......@@ -28,6 +28,7 @@ enum dm_queue_mode {
DM_TYPE_REQUEST_BASED = 2,
DM_TYPE_MQ_REQUEST_BASED = 3,
DM_TYPE_DAX_BIO_BASED = 4,
DM_TYPE_NVME_BIO_BASED = 5,
};
typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
......@@ -220,14 +221,6 @@ struct target_type {
#define DM_TARGET_WILDCARD 0x00000008
#define dm_target_is_wildcard(type) ((type)->features & DM_TARGET_WILDCARD)
/*
* Some targets need to be sent the same WRITE bio severals times so
* that they can send copies of it to different devices. This function
* examines any supplied bio and returns the number of copies of it the
* target requires.
*/
typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio);
/*
* A target implements own bio data integrity.
*/
......@@ -291,13 +284,6 @@ struct dm_target {
*/
unsigned per_io_data_size;
/*
* If defined, this function is called to find out how many
* duplicate bios should be sent to the target when writing
* data.
*/
dm_num_write_bios_fn num_write_bios;
/* target specific data */
void *private;
......@@ -329,35 +315,9 @@ struct dm_target_callbacks {
int (*congested_fn) (struct dm_target_callbacks *, int);
};
/*
* For bio-based dm.
* One of these is allocated for each bio.
* This structure shouldn't be touched directly by target drivers.
* It is here so that we can inline dm_per_bio_data and
* dm_bio_from_per_bio_data
*/
struct dm_target_io {
struct dm_io *io;
struct dm_target *ti;
unsigned target_bio_nr;
unsigned *len_ptr;
struct bio clone;
};
static inline void *dm_per_bio_data(struct bio *bio, size_t data_size)
{
return (char *)bio - offsetof(struct dm_target_io, clone) - data_size;
}
static inline struct bio *dm_bio_from_per_bio_data(void *data, size_t data_size)
{
return (struct bio *)((char *)data + data_size + offsetof(struct dm_target_io, clone));
}
static inline unsigned dm_bio_get_target_bio_nr(const struct bio *bio)
{
return container_of(bio, struct dm_target_io, clone)->target_bio_nr;
}
void *dm_per_bio_data(struct bio *bio, size_t data_size);
struct bio *dm_bio_from_per_bio_data(void *data, size_t data_size);
unsigned dm_bio_get_target_bio_nr(const struct bio *bio);
int dm_register_target(struct target_type *t);
void dm_unregister_target(struct target_type *t);
......@@ -499,6 +459,11 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type);
*/
int dm_table_complete(struct dm_table *t);
/*
* Destroy the table when finished.
*/
void dm_table_destroy(struct dm_table *t);
/*
* Target may require that it is never sent I/O larger than len.
*/
......@@ -585,6 +550,7 @@ do { \
#define DM_ENDIO_DONE 0
#define DM_ENDIO_INCOMPLETE 1
#define DM_ENDIO_REQUEUE 2
#define DM_ENDIO_DELAY_REQUEUE 3
/*
* Definitions of return values from target map function.
......@@ -592,7 +558,7 @@ do { \
#define DM_MAPIO_SUBMITTED 0
#define DM_MAPIO_REMAPPED 1
#define DM_MAPIO_REQUEUE DM_ENDIO_REQUEUE
#define DM_MAPIO_DELAY_REQUEUE 3
#define DM_MAPIO_DELAY_REQUEUE DM_ENDIO_DELAY_REQUEUE
#define DM_MAPIO_KILL 4
#define dm_sector_div64(x, y)( \
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment