Commit b25c6644 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-5.8/dm-changes' of...

Merge tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - The largest change for this cycle is the DM zoned target's metadata
   version 2 feature that adds support for pairing regular block devices
   with a zoned device to ease the performance impact associated with
   finite random zones of zoned device.

   The changes came in three batches: the first prepared for and then
   added the ability to pair a single regular block device, the second
   was a batch of fixes to improve zoned's reclaim heuristic, and the
   third removed the limitation of only adding a single additional
   regular block device to allow many devices.

   Testing has shown linear scaling as more devices are added.

 - Add new emulated block size (ebs) target that emulates a smaller
   logical_block_size than a block device supports

   The primary use-case is to emulate "512e" devices that have 512 byte
   logical_block_size and 4KB physical_block_size. This is useful to
   some legacy applications that otherwise wouldn't be able to be used
   on 4K devices because they depend on issuing IO in 512 byte
   granularity.

 - Add discard interfaces to DM bufio. First consumer of the interface
   is the dm-ebs target that makes heavy use of dm-bufio.

 - Fix DM crypt's block queue_limits stacking to not truncate
   logic_block_size.

 - Add Documentation for DM integrity's status line.

 - Switch DMDEBUG from a compile time config option to instead use
   dynamic debug via pr_debug.

 - Fix DM multipath target's hueristic for how it manages
   "queue_if_no_path" state internally.

   DM multipath now avoids disabling "queue_if_no_path" unless it is
   actually needed (e.g. in response to configure timeout or explicit
   "fail_if_no_path" message).

   This fixes reports of spurious -EIO being reported back to userspace
   application during fault tolerance testing with an NVMe backend.
   Added various dynamic DMDEBUG messages to assist with debugging
   queue_if_no_path in the future.

 - Add a new DM multipath "Historical Service Time" Path Selector.

 - Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.

 - Improve DM writecache target performance by using explicit cache
   flushing for target's single-threaded usecase and a small cleanup to
   remove unnecessary test in persistent_memory_claim.

 - Other small cleanups in DM core, dm-persistent-data, and DM
   integrity.

* tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
  dm crypt: avoid truncating the logical block size
  dm mpath: add DM device name to Failing/Reinstating path log messages
  dm mpath: enhance queue_if_no_path debugging
  dm mpath: restrict queue_if_no_path state machine
  dm mpath: simplify __must_push_back
  dm zoned: check superblock location
  dm zoned: prefer full zones for reclaim
  dm zoned: select reclaim zone based on device index
  dm zoned: allocate zone by device index
  dm zoned: support arbitrary number of devices
  dm zoned: move random and sequential zones into struct dmz_dev
  dm zoned: per-device reclaim
  dm zoned: add metadata pointer to struct dmz_dev
  dm zoned: add device pointer to struct dm_zone
  dm zoned: allocate temporary superblock for tertiary devices
  dm zoned: convert to xarray
  dm zoned: add a 'reserved' zone flag
  dm zoned: improve logging messages for reclaim
  dm zoned: avoid unnecessary device recalulation for secondary superblock
  dm zoned: add debugging message for reading superblocks
  ...
parents 818dbde7 64611a15
======
dm-ebs
======
This target is similar to the linear target except that it emulates
a smaller logical block size on a device with a larger logical block
size. Its main purpose is to provide emulation of 512 byte sectors on
devices that do not provide this emulation (i.e. 4K native disks).
Supported emulated logical block sizes 512, 1024, 2048 and 4096.
Underlying block size can be set to > 4K to test buffering larger units.
Table parameters
----------------
<dev path> <offset> <emulated sectors> [<underlying sectors>]
Mandatory parameters:
<dev path>:
Full pathname to the underlying block-device,
or a "major:minor" device-number.
<offset>:
Starting sector within the device;
has to be a multiple of <emulated sectors>.
<emulated sectors>:
Number of sectors defining the logical block size to be emulated;
1, 2, 4, 8 sectors of 512 bytes supported.
Optional parameter:
<underyling sectors>:
Number of sectors defining the logical block size of <dev path>.
2^N supported, e.g. 8 = emulate 8 sectors of 512 bytes = 4KiB.
If not provided, the logical block size of <dev path> will be used.
Examples:
Emulate 1 sector = 512 bytes logical block size on /dev/sda starting at
offset 1024 sectors with underlying devices block size automatically set:
ebs /dev/sda 1024 1
Emulate 2 sector = 1KiB logical block size on /dev/sda starting at
offset 128 sectors, enforce 2KiB underlying device block size.
This presumes 2KiB logical blocksize on /dev/sda or less to work:
ebs /dev/sda 128 2 4
...@@ -193,6 +193,14 @@ should not be changed when reloading the target because the layout of disk ...@@ -193,6 +193,14 @@ should not be changed when reloading the target because the layout of disk
data depend on them and the reloaded target would be non-functional. data depend on them and the reloaded target would be non-functional.
Status line:
1. the number of integrity mismatches
2. provided data sectors - that is the number of sectors that the user
could use
3. the current recalculating position (or '-' if we didn't recalculate)
The layout of the formatted block device: The layout of the formatted block device:
* reserved sectors * reserved sectors
......
...@@ -37,9 +37,13 @@ Algorithm ...@@ -37,9 +37,13 @@ Algorithm
dm-zoned implements an on-disk buffering scheme to handle non-sequential dm-zoned implements an on-disk buffering scheme to handle non-sequential
write accesses to the sequential zones of a zoned block device. write accesses to the sequential zones of a zoned block device.
Conventional zones are used for caching as well as for storing internal Conventional zones are used for caching as well as for storing internal
metadata. metadata. It can also use a regular block device together with the zoned
block device; in that case the regular block device will be split logically
in zones with the same size as the zoned block device. These zones will be
placed in front of the zones from the zoned block device and will be handled
just like conventional zones.
The zones of the device are separated into 2 types: The zones of the device(s) are separated into 2 types:
1) Metadata zones: these are conventional zones used to store metadata. 1) Metadata zones: these are conventional zones used to store metadata.
Metadata zones are not reported as useable capacity to the user. Metadata zones are not reported as useable capacity to the user.
...@@ -127,6 +131,13 @@ resumed. Flushing metadata thus only temporarily delays write and ...@@ -127,6 +131,13 @@ resumed. Flushing metadata thus only temporarily delays write and
discard requests. Read requests can be processed concurrently while discard requests. Read requests can be processed concurrently while
metadata flush is being executed. metadata flush is being executed.
If a regular device is used in conjunction with the zoned block device,
a third set of metadata (without the zone bitmaps) is written to the
start of the zoned block device. This metadata has a generation counter of
'0' and will never be updated during normal operation; it just serves for
identification purposes. The first and second copy of the metadata
are located at the start of the regular block device.
Usage Usage
===== =====
...@@ -138,9 +149,46 @@ Ex:: ...@@ -138,9 +149,46 @@ Ex::
dmzadm --format /dev/sdxx dmzadm --format /dev/sdxx
For a formatted device, the target can be created normally with the
dmsetup utility. The only parameter that dm-zoned requires is the
underlying zoned block device name. Ex::
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \ If two drives are to be used, both devices must be specified, with the
dmsetup create dmz-`basename ${dev}` regular block device as the first device.
Ex::
dmzadm --format /dev/sdxx /dev/sdyy
Fomatted device(s) can be started with the dmzadm utility, too.:
Ex::
dmzadm --start /dev/sdxx /dev/sdyy
Information about the internal layout and current usage of the zones can
be obtained with the 'status' callback from dmsetup:
Ex::
dmsetup status /dev/dm-X
will return a line
0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
<nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
total number of sequential zones.
Normally the reclaim process will be started once there are less than 50
percent free random zones. In order to start the reclaim process manually
even before reaching this threshold the 'dmsetup message' function can be
used:
Ex::
dmsetup message /dev/dm-X 0 reclaim
will start the reclaim process and random zones will be moved to sequential
zones.
...@@ -269,6 +269,7 @@ config DM_UNSTRIPED ...@@ -269,6 +269,7 @@ config DM_UNSTRIPED
config DM_CRYPT config DM_CRYPT
tristate "Crypt target support" tristate "Crypt target support"
depends on BLK_DEV_DM depends on BLK_DEV_DM
depends on (ENCRYPTED_KEYS || ENCRYPTED_KEYS=n)
select CRYPTO select CRYPTO
select CRYPTO_CBC select CRYPTO_CBC
select CRYPTO_ESSIV select CRYPTO_ESSIV
...@@ -336,6 +337,14 @@ config DM_WRITECACHE ...@@ -336,6 +337,14 @@ config DM_WRITECACHE
The writecache target doesn't cache reads because reads are supposed The writecache target doesn't cache reads because reads are supposed
to be cached in standard RAM. to be cached in standard RAM.
config DM_EBS
tristate "Emulated block size target (EXPERIMENTAL)"
depends on BLK_DEV_DM
select DM_BUFIO
help
dm-ebs emulates smaller logical block size on backing devices
with larger ones (e.g. 512 byte sectors on 4K native disks).
config DM_ERA config DM_ERA
tristate "Era target (EXPERIMENTAL)" tristate "Era target (EXPERIMENTAL)"
depends on BLK_DEV_DM depends on BLK_DEV_DM
...@@ -443,6 +452,17 @@ config DM_MULTIPATH_ST ...@@ -443,6 +452,17 @@ config DM_MULTIPATH_ST
If unsure, say N. If unsure, say N.
config DM_MULTIPATH_HST
tristate "I/O Path Selector based on historical service time"
depends on DM_MULTIPATH
help
This path selector is a dynamic load balancer which selects
the path expected to complete the incoming I/O in the shortest
time by comparing estimated service time (based on historical
service time).
If unsure, say N.
config DM_DELAY config DM_DELAY
tristate "I/O delaying target" tristate "I/O delaying target"
depends on BLK_DEV_DM depends on BLK_DEV_DM
......
...@@ -17,6 +17,7 @@ dm-thin-pool-y += dm-thin.o dm-thin-metadata.o ...@@ -17,6 +17,7 @@ dm-thin-pool-y += dm-thin.o dm-thin-metadata.o
dm-cache-y += dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \ dm-cache-y += dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \
dm-cache-background-tracker.o dm-cache-background-tracker.o
dm-cache-smq-y += dm-cache-policy-smq.o dm-cache-smq-y += dm-cache-policy-smq.o
dm-ebs-y += dm-ebs-target.o
dm-era-y += dm-era-target.o dm-era-y += dm-era-target.o
dm-clone-y += dm-clone-target.o dm-clone-metadata.o dm-clone-y += dm-clone-target.o dm-clone-metadata.o
dm-verity-y += dm-verity-target.o dm-verity-y += dm-verity-target.o
...@@ -54,6 +55,7 @@ obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o ...@@ -54,6 +55,7 @@ obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o
obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o
obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
obj-$(CONFIG_DM_MULTIPATH_HST) += dm-historical-service-time.o
obj-$(CONFIG_DM_SWITCH) += dm-switch.o obj-$(CONFIG_DM_SWITCH) += dm-switch.o
obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/ obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
...@@ -65,6 +67,7 @@ obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o ...@@ -65,6 +67,7 @@ obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
obj-$(CONFIG_DM_VERITY) += dm-verity.o obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_CACHE) += dm-cache.o obj-$(CONFIG_DM_CACHE) += dm-cache.o
obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o
obj-$(CONFIG_DM_EBS) += dm-ebs.o
obj-$(CONFIG_DM_ERA) += dm-era.o obj-$(CONFIG_DM_ERA) += dm-era.o
obj-$(CONFIG_DM_CLONE) += dm-clone.o obj-$(CONFIG_DM_CLONE) += dm-clone.o
obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o
......
...@@ -256,12 +256,35 @@ static struct dm_buffer *__find(struct dm_bufio_client *c, sector_t block) ...@@ -256,12 +256,35 @@ static struct dm_buffer *__find(struct dm_bufio_client *c, sector_t block)
if (b->block == block) if (b->block == block)
return b; return b;
n = (b->block < block) ? n->rb_left : n->rb_right; n = block < b->block ? n->rb_left : n->rb_right;
} }
return NULL; return NULL;
} }
static struct dm_buffer *__find_next(struct dm_bufio_client *c, sector_t block)
{
struct rb_node *n = c->buffer_tree.rb_node;
struct dm_buffer *b;
struct dm_buffer *best = NULL;
while (n) {
b = container_of(n, struct dm_buffer, node);
if (b->block == block)
return b;
if (block <= b->block) {
n = n->rb_left;
best = b;
} else {
n = n->rb_right;
}
}
return best;
}
static void __insert(struct dm_bufio_client *c, struct dm_buffer *b) static void __insert(struct dm_bufio_client *c, struct dm_buffer *b)
{ {
struct rb_node **new = &c->buffer_tree.rb_node, *parent = NULL; struct rb_node **new = &c->buffer_tree.rb_node, *parent = NULL;
...@@ -276,8 +299,8 @@ static void __insert(struct dm_bufio_client *c, struct dm_buffer *b) ...@@ -276,8 +299,8 @@ static void __insert(struct dm_bufio_client *c, struct dm_buffer *b)
} }
parent = *new; parent = *new;
new = (found->block < b->block) ? new = b->block < found->block ?
&((*new)->rb_left) : &((*new)->rb_right); &found->node.rb_left : &found->node.rb_right;
} }
rb_link_node(&b->node, parent, new); rb_link_node(&b->node, parent, new);
...@@ -631,6 +654,19 @@ static void use_bio(struct dm_buffer *b, int rw, sector_t sector, ...@@ -631,6 +654,19 @@ static void use_bio(struct dm_buffer *b, int rw, sector_t sector,
submit_bio(bio); submit_bio(bio);
} }
static inline sector_t block_to_sector(struct dm_bufio_client *c, sector_t block)
{
sector_t sector;
if (likely(c->sectors_per_block_bits >= 0))
sector = block << c->sectors_per_block_bits;
else
sector = block * (c->block_size >> SECTOR_SHIFT);
sector += c->start;
return sector;
}
static void submit_io(struct dm_buffer *b, int rw, void (*end_io)(struct dm_buffer *, blk_status_t)) static void submit_io(struct dm_buffer *b, int rw, void (*end_io)(struct dm_buffer *, blk_status_t))
{ {
unsigned n_sectors; unsigned n_sectors;
...@@ -639,11 +675,7 @@ static void submit_io(struct dm_buffer *b, int rw, void (*end_io)(struct dm_buff ...@@ -639,11 +675,7 @@ static void submit_io(struct dm_buffer *b, int rw, void (*end_io)(struct dm_buff
b->end_io = end_io; b->end_io = end_io;
if (likely(b->c->sectors_per_block_bits >= 0)) sector = block_to_sector(b->c, b->block);
sector = b->block << b->c->sectors_per_block_bits;
else
sector = b->block * (b->c->block_size >> SECTOR_SHIFT);
sector += b->c->start;
if (rw != REQ_OP_WRITE) { if (rw != REQ_OP_WRITE) {
n_sectors = b->c->block_size >> SECTOR_SHIFT; n_sectors = b->c->block_size >> SECTOR_SHIFT;
...@@ -1325,6 +1357,30 @@ int dm_bufio_issue_flush(struct dm_bufio_client *c) ...@@ -1325,6 +1357,30 @@ int dm_bufio_issue_flush(struct dm_bufio_client *c)
} }
EXPORT_SYMBOL_GPL(dm_bufio_issue_flush); EXPORT_SYMBOL_GPL(dm_bufio_issue_flush);
/*
* Use dm-io to send a discard request to flush the device.
*/
int dm_bufio_issue_discard(struct dm_bufio_client *c, sector_t block, sector_t count)
{
struct dm_io_request io_req = {
.bi_op = REQ_OP_DISCARD,
.bi_op_flags = REQ_SYNC,
.mem.type = DM_IO_KMEM,
.mem.ptr.addr = NULL,
.client = c->dm_io,
};
struct dm_io_region io_reg = {
.bdev = c->bdev,
.sector = block_to_sector(c, block),
.count = block_to_sector(c, count),
};
BUG_ON(dm_bufio_in_request());
return dm_io(&io_req, 1, &io_reg, NULL);
}
EXPORT_SYMBOL_GPL(dm_bufio_issue_discard);
/* /*
* We first delete any other buffer that may be at that new location. * We first delete any other buffer that may be at that new location.
* *
...@@ -1401,6 +1457,14 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block) ...@@ -1401,6 +1457,14 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block)
} }
EXPORT_SYMBOL_GPL(dm_bufio_release_move); EXPORT_SYMBOL_GPL(dm_bufio_release_move);
static void forget_buffer_locked(struct dm_buffer *b)
{
if (likely(!b->hold_count) && likely(!b->state)) {
__unlink_buffer(b);
__free_buffer_wake(b);
}
}
/* /*
* Free the given buffer. * Free the given buffer.
* *
...@@ -1414,15 +1478,36 @@ void dm_bufio_forget(struct dm_bufio_client *c, sector_t block) ...@@ -1414,15 +1478,36 @@ void dm_bufio_forget(struct dm_bufio_client *c, sector_t block)
dm_bufio_lock(c); dm_bufio_lock(c);
b = __find(c, block); b = __find(c, block);
if (b && likely(!b->hold_count) && likely(!b->state)) { if (b)
__unlink_buffer(b); forget_buffer_locked(b);
__free_buffer_wake(b);
}
dm_bufio_unlock(c); dm_bufio_unlock(c);
} }
EXPORT_SYMBOL_GPL(dm_bufio_forget); EXPORT_SYMBOL_GPL(dm_bufio_forget);
void dm_bufio_forget_buffers(struct dm_bufio_client *c, sector_t block, sector_t n_blocks)
{
struct dm_buffer *b;
sector_t end_block = block + n_blocks;
while (block < end_block) {
dm_bufio_lock(c);
b = __find_next(c, block);
if (b) {
block = b->block + 1;
forget_buffer_locked(b);
}
dm_bufio_unlock(c);
if (!b)
break;
}
}
EXPORT_SYMBOL_GPL(dm_bufio_forget_buffers);
void dm_bufio_set_minimum_buffers(struct dm_bufio_client *c, unsigned n) void dm_bufio_set_minimum_buffers(struct dm_bufio_client *c, unsigned n)
{ {
c->minimum_buffers = n; c->minimum_buffers = n;
......
...@@ -34,7 +34,9 @@ ...@@ -34,7 +34,9 @@
#include <crypto/aead.h> #include <crypto/aead.h>
#include <crypto/authenc.h> #include <crypto/authenc.h>
#include <linux/rtnetlink.h> /* for struct rtattr and RTA macros only */ #include <linux/rtnetlink.h> /* for struct rtattr and RTA macros only */
#include <linux/key-type.h>
#include <keys/user-type.h> #include <keys/user-type.h>
#include <keys/encrypted-type.h>
#include <linux/device-mapper.h> #include <linux/device-mapper.h>
...@@ -212,7 +214,7 @@ struct crypt_config { ...@@ -212,7 +214,7 @@ struct crypt_config {
struct mutex bio_alloc_lock; struct mutex bio_alloc_lock;
u8 *authenc_key; /* space for keys in authenc() format (if used) */ u8 *authenc_key; /* space for keys in authenc() format (if used) */
u8 key[0]; u8 key[];
}; };
#define MIN_IOS 64 #define MIN_IOS 64
...@@ -2215,12 +2217,47 @@ static bool contains_whitespace(const char *str) ...@@ -2215,12 +2217,47 @@ static bool contains_whitespace(const char *str)
return false; return false;
} }
static int set_key_user(struct crypt_config *cc, struct key *key)
{
const struct user_key_payload *ukp;
ukp = user_key_payload_locked(key);
if (!ukp)
return -EKEYREVOKED;
if (cc->key_size != ukp->datalen)
return -EINVAL;
memcpy(cc->key, ukp->data, cc->key_size);
return 0;
}
#if defined(CONFIG_ENCRYPTED_KEYS) || defined(CONFIG_ENCRYPTED_KEYS_MODULE)
static int set_key_encrypted(struct crypt_config *cc, struct key *key)
{
const struct encrypted_key_payload *ekp;
ekp = key->payload.data[0];
if (!ekp)
return -EKEYREVOKED;
if (cc->key_size != ekp->decrypted_datalen)
return -EINVAL;
memcpy(cc->key, ekp->decrypted_data, cc->key_size);
return 0;
}
#endif /* CONFIG_ENCRYPTED_KEYS */
static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string) static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string)
{ {
char *new_key_string, *key_desc; char *new_key_string, *key_desc;
int ret; int ret;
struct key_type *type;
struct key *key; struct key *key;
const struct user_key_payload *ukp; int (*set_key)(struct crypt_config *cc, struct key *key);
/* /*
* Reject key_string with whitespace. dm core currently lacks code for * Reject key_string with whitespace. dm core currently lacks code for
...@@ -2236,16 +2273,26 @@ static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string ...@@ -2236,16 +2273,26 @@ static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string
if (!key_desc || key_desc == key_string || !strlen(key_desc + 1)) if (!key_desc || key_desc == key_string || !strlen(key_desc + 1))
return -EINVAL; return -EINVAL;
if (strncmp(key_string, "logon:", key_desc - key_string + 1) && if (!strncmp(key_string, "logon:", key_desc - key_string + 1)) {
strncmp(key_string, "user:", key_desc - key_string + 1)) type = &key_type_logon;
set_key = set_key_user;
} else if (!strncmp(key_string, "user:", key_desc - key_string + 1)) {
type = &key_type_user;
set_key = set_key_user;
#if defined(CONFIG_ENCRYPTED_KEYS) || defined(CONFIG_ENCRYPTED_KEYS_MODULE)
} else if (!strncmp(key_string, "encrypted:", key_desc - key_string + 1)) {
type = &key_type_encrypted;
set_key = set_key_encrypted;
#endif
} else {
return -EINVAL; return -EINVAL;
}
new_key_string = kstrdup(key_string, GFP_KERNEL); new_key_string = kstrdup(key_string, GFP_KERNEL);
if (!new_key_string) if (!new_key_string)
return -ENOMEM; return -ENOMEM;
key = request_key(key_string[0] == 'l' ? &key_type_logon : &key_type_user, key = request_key(type, key_desc + 1, NULL);
key_desc + 1, NULL);
if (IS_ERR(key)) { if (IS_ERR(key)) {
kzfree(new_key_string); kzfree(new_key_string);
return PTR_ERR(key); return PTR_ERR(key);
...@@ -2253,23 +2300,14 @@ static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string ...@@ -2253,23 +2300,14 @@ static int crypt_set_keyring_key(struct crypt_config *cc, const char *key_string
down_read(&key->sem); down_read(&key->sem);
ukp = user_key_payload_locked(key); ret = set_key(cc, key);
if (!ukp) { if (ret < 0) {
up_read(&key->sem);
key_put(key);
kzfree(new_key_string);
return -EKEYREVOKED;
}
if (cc->key_size != ukp->datalen) {
up_read(&key->sem); up_read(&key->sem);
key_put(key); key_put(key);
kzfree(new_key_string); kzfree(new_key_string);
return -EINVAL; return ret;
} }
memcpy(cc->key, ukp->data, cc->key_size);
up_read(&key->sem); up_read(&key->sem);
key_put(key); key_put(key);
...@@ -2323,7 +2361,7 @@ static int get_key_size(char **key_string) ...@@ -2323,7 +2361,7 @@ static int get_key_size(char **key_string)
return (*key_string[0] == ':') ? -EINVAL : strlen(*key_string) >> 1; return (*key_string[0] == ':') ? -EINVAL : strlen(*key_string) >> 1;
} }
#endif #endif /* CONFIG_KEYS */
static int crypt_set_key(struct crypt_config *cc, char *key) static int crypt_set_key(struct crypt_config *cc, char *key)
{ {
...@@ -3274,7 +3312,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -3274,7 +3312,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
limits->max_segment_size = PAGE_SIZE; limits->max_segment_size = PAGE_SIZE;
limits->logical_block_size = limits->logical_block_size =
max_t(unsigned short, limits->logical_block_size, cc->sector_size); max_t(unsigned, limits->logical_block_size, cc->sector_size);
limits->physical_block_size = limits->physical_block_size =
max_t(unsigned, limits->physical_block_size, cc->sector_size); max_t(unsigned, limits->physical_block_size, cc->sector_size);
limits->io_min = max_t(unsigned, limits->io_min, cc->sector_size); limits->io_min = max_t(unsigned, limits->io_min, cc->sector_size);
...@@ -3282,7 +3320,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -3282,7 +3320,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type crypt_target = { static struct target_type crypt_target = {
.name = "crypt", .name = "crypt",
.version = {1, 20, 0}, .version = {1, 21, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = crypt_ctr, .ctr = crypt_ctr,
.dtr = crypt_dtr, .dtr = crypt_dtr,
......
This diff is collapsed.
This diff is collapsed.
...@@ -92,7 +92,7 @@ struct journal_entry { ...@@ -92,7 +92,7 @@ struct journal_entry {
} s; } s;
__u64 sector; __u64 sector;
} u; } u;
commit_id_t last_bytes[0]; commit_id_t last_bytes[];
/* __u8 tag[0]; */ /* __u8 tag[0]; */
}; };
...@@ -1553,8 +1553,6 @@ static void integrity_metadata(struct work_struct *w) ...@@ -1553,8 +1553,6 @@ static void integrity_metadata(struct work_struct *w)
char checksums_onstack[max((size_t)HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)]; char checksums_onstack[max((size_t)HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
sector_t sector; sector_t sector;
unsigned sectors_to_process; unsigned sectors_to_process;
sector_t save_metadata_block;
unsigned save_metadata_offset;
if (unlikely(ic->mode == 'R')) if (unlikely(ic->mode == 'R'))
goto skip_io; goto skip_io;
...@@ -1605,8 +1603,6 @@ static void integrity_metadata(struct work_struct *w) ...@@ -1605,8 +1603,6 @@ static void integrity_metadata(struct work_struct *w)
goto skip_io; goto skip_io;
} }
save_metadata_block = dio->metadata_block;
save_metadata_offset = dio->metadata_offset;
sector = dio->range.logical_sector; sector = dio->range.logical_sector;
sectors_to_process = dio->range.n_sectors; sectors_to_process = dio->range.n_sectors;
......
...@@ -127,7 +127,7 @@ struct pending_block { ...@@ -127,7 +127,7 @@ struct pending_block {
char *data; char *data;
u32 datalen; u32 datalen;
struct list_head list; struct list_head list;
struct bio_vec vecs[0]; struct bio_vec vecs[];
}; };
struct per_bio_data { struct per_bio_data {
......
...@@ -439,7 +439,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) ...@@ -439,7 +439,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
} }
/* /*
* dm_report_EIO() is a macro instead of a function to make pr_debug() * dm_report_EIO() is a macro instead of a function to make pr_debug_ratelimited()
* report the function name and line number of the function from which * report the function name and line number of the function from which
* it has been invoked. * it has been invoked.
*/ */
...@@ -447,7 +447,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) ...@@ -447,7 +447,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes)
do { \ do { \
struct mapped_device *md = dm_table_get_md((m)->ti->table); \ struct mapped_device *md = dm_table_get_md((m)->ti->table); \
\ \
pr_debug("%s: returning EIO; QIFNP = %d; SQIFNP = %d; DNFS = %d\n", \ DMDEBUG_LIMIT("%s: returning EIO; QIFNP = %d; SQIFNP = %d; DNFS = %d", \
dm_device_name(md), \ dm_device_name(md), \
test_bit(MPATHF_QUEUE_IF_NO_PATH, &(m)->flags), \ test_bit(MPATHF_QUEUE_IF_NO_PATH, &(m)->flags), \
test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &(m)->flags), \ test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &(m)->flags), \
...@@ -457,33 +457,15 @@ do { \ ...@@ -457,33 +457,15 @@ do { \
/* /*
* Check whether bios must be queued in the device-mapper core rather * Check whether bios must be queued in the device-mapper core rather
* than here in the target. * than here in the target.
*
* If MPATHF_QUEUE_IF_NO_PATH and MPATHF_SAVED_QUEUE_IF_NO_PATH hold
* the same value then we are not between multipath_presuspend()
* and multipath_resume() calls and we have no need to check
* for the DMF_NOFLUSH_SUSPENDING flag.
*/ */
static bool __must_push_back(struct multipath *m, unsigned long flags) static bool __must_push_back(struct multipath *m)
{ {
return ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &flags) != return dm_noflush_suspending(m->ti);
test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &flags)) &&
dm_noflush_suspending(m->ti));
} }
/*
* Following functions use READ_ONCE to get atomic access to
* all m->flags to avoid taking spinlock
*/
static bool must_push_back_rq(struct multipath *m) static bool must_push_back_rq(struct multipath *m)
{ {
unsigned long flags = READ_ONCE(m->flags); return test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || __must_push_back(m);
return test_bit(MPATHF_QUEUE_IF_NO_PATH, &flags) || __must_push_back(m, flags);
}
static bool must_push_back_bio(struct multipath *m)
{
unsigned long flags = READ_ONCE(m->flags);
return __must_push_back(m, flags);
} }
/* /*
...@@ -567,7 +549,8 @@ static void multipath_release_clone(struct request *clone, ...@@ -567,7 +549,8 @@ static void multipath_release_clone(struct request *clone,
if (pgpath && pgpath->pg->ps.type->end_io) if (pgpath && pgpath->pg->ps.type->end_io)
pgpath->pg->ps.type->end_io(&pgpath->pg->ps, pgpath->pg->ps.type->end_io(&pgpath->pg->ps,
&pgpath->path, &pgpath->path,
mpio->nr_bytes); mpio->nr_bytes,
clone->io_start_time_ns);
} }
blk_put_request(clone); blk_put_request(clone);
...@@ -619,7 +602,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, ...@@ -619,7 +602,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio,
return DM_MAPIO_SUBMITTED; return DM_MAPIO_SUBMITTED;
if (!pgpath) { if (!pgpath) {
if (must_push_back_bio(m)) if (__must_push_back(m))
return DM_MAPIO_REQUEUE; return DM_MAPIO_REQUEUE;
dm_report_EIO(m); dm_report_EIO(m);
return DM_MAPIO_KILL; return DM_MAPIO_KILL;
...@@ -709,15 +692,38 @@ static void process_queued_bios(struct work_struct *work) ...@@ -709,15 +692,38 @@ static void process_queued_bios(struct work_struct *work)
* If we run out of usable paths, should we queue I/O or error it? * If we run out of usable paths, should we queue I/O or error it?
*/ */
static int queue_if_no_path(struct multipath *m, bool queue_if_no_path, static int queue_if_no_path(struct multipath *m, bool queue_if_no_path,
bool save_old_value) bool save_old_value, const char *caller)
{ {
unsigned long flags; unsigned long flags;
bool queue_if_no_path_bit, saved_queue_if_no_path_bit;
const char *dm_dev_name = dm_device_name(dm_table_get_md(m->ti->table));
DMDEBUG("%s: %s caller=%s queue_if_no_path=%d save_old_value=%d",
dm_dev_name, __func__, caller, queue_if_no_path, save_old_value);
spin_lock_irqsave(&m->lock, flags); spin_lock_irqsave(&m->lock, flags);
assign_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags,
(save_old_value && test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) || queue_if_no_path_bit = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
(!save_old_value && queue_if_no_path)); saved_queue_if_no_path_bit = test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags);
if (save_old_value) {
if (unlikely(!queue_if_no_path_bit && saved_queue_if_no_path_bit)) {
DMERR("%s: QIFNP disabled but saved as enabled, saving again loses state, not saving!",
dm_dev_name);
} else
assign_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags, queue_if_no_path_bit);
} else if (!queue_if_no_path && saved_queue_if_no_path_bit) {
/* due to "fail_if_no_path" message, need to honor it. */
clear_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags);
}
assign_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags, queue_if_no_path); assign_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags, queue_if_no_path);
DMDEBUG("%s: after %s changes; QIFNP = %d; SQIFNP = %d; DNFS = %d",
dm_dev_name, __func__,
test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags),
test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags),
dm_noflush_suspending(m->ti));
spin_unlock_irqrestore(&m->lock, flags); spin_unlock_irqrestore(&m->lock, flags);
if (!queue_if_no_path) { if (!queue_if_no_path) {
...@@ -738,7 +744,7 @@ static void queue_if_no_path_timeout_work(struct timer_list *t) ...@@ -738,7 +744,7 @@ static void queue_if_no_path_timeout_work(struct timer_list *t)
struct mapped_device *md = dm_table_get_md(m->ti->table); struct mapped_device *md = dm_table_get_md(m->ti->table);
DMWARN("queue_if_no_path timeout on %s, failing queued IO", dm_device_name(md)); DMWARN("queue_if_no_path timeout on %s, failing queued IO", dm_device_name(md));
queue_if_no_path(m, false, false); queue_if_no_path(m, false, false, __func__);
} }
/* /*
...@@ -1078,7 +1084,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m) ...@@ -1078,7 +1084,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
argc--; argc--;
if (!strcasecmp(arg_name, "queue_if_no_path")) { if (!strcasecmp(arg_name, "queue_if_no_path")) {
r = queue_if_no_path(m, true, false); r = queue_if_no_path(m, true, false, __func__);
continue; continue;
} }
...@@ -1279,7 +1285,9 @@ static int fail_path(struct pgpath *pgpath) ...@@ -1279,7 +1285,9 @@ static int fail_path(struct pgpath *pgpath)
if (!pgpath->is_active) if (!pgpath->is_active)
goto out; goto out;
DMWARN("Failing path %s.", pgpath->path.dev->name); DMWARN("%s: Failing path %s.",
dm_device_name(dm_table_get_md(m->ti->table)),
pgpath->path.dev->name);
pgpath->pg->ps.type->fail_path(&pgpath->pg->ps, &pgpath->path); pgpath->pg->ps.type->fail_path(&pgpath->pg->ps, &pgpath->path);
pgpath->is_active = false; pgpath->is_active = false;
...@@ -1318,7 +1326,9 @@ static int reinstate_path(struct pgpath *pgpath) ...@@ -1318,7 +1326,9 @@ static int reinstate_path(struct pgpath *pgpath)
if (pgpath->is_active) if (pgpath->is_active)
goto out; goto out;
DMWARN("Reinstating path %s.", pgpath->path.dev->name); DMWARN("%s: Reinstating path %s.",
dm_device_name(dm_table_get_md(m->ti->table)),
pgpath->path.dev->name);
r = pgpath->pg->ps.type->reinstate_path(&pgpath->pg->ps, &pgpath->path); r = pgpath->pg->ps.type->reinstate_path(&pgpath->pg->ps, &pgpath->path);
if (r) if (r)
...@@ -1617,7 +1627,8 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone, ...@@ -1617,7 +1627,8 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone,
struct path_selector *ps = &pgpath->pg->ps; struct path_selector *ps = &pgpath->pg->ps;
if (ps->type->end_io) if (ps->type->end_io)
ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes,
clone->io_start_time_ns);
} }
return r; return r;
...@@ -1640,7 +1651,7 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, ...@@ -1640,7 +1651,7 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone,
if (atomic_read(&m->nr_valid_paths) == 0 && if (atomic_read(&m->nr_valid_paths) == 0 &&
!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { !test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
if (must_push_back_bio(m)) { if (__must_push_back(m)) {
r = DM_ENDIO_REQUEUE; r = DM_ENDIO_REQUEUE;
} else { } else {
dm_report_EIO(m); dm_report_EIO(m);
...@@ -1661,23 +1672,27 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, ...@@ -1661,23 +1672,27 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone,
struct path_selector *ps = &pgpath->pg->ps; struct path_selector *ps = &pgpath->pg->ps;
if (ps->type->end_io) if (ps->type->end_io)
ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes,
dm_start_time_ns_from_clone(clone));
} }
return r; return r;
} }
/* /*
* Suspend can't complete until all the I/O is processed so if * Suspend with flush can't complete until all the I/O is processed
* the last path fails we must error any remaining I/O. * so if the last path fails we must error any remaining I/O.
* Note that if the freeze_bdev fails while suspending, the * - Note that if the freeze_bdev fails while suspending, the
* queue_if_no_path state is lost - userspace should reset it. * queue_if_no_path state is lost - userspace should reset it.
* Otherwise, during noflush suspend, queue_if_no_path will not change.
*/ */
static void multipath_presuspend(struct dm_target *ti) static void multipath_presuspend(struct dm_target *ti)
{ {
struct multipath *m = ti->private; struct multipath *m = ti->private;
queue_if_no_path(m, false, true); /* FIXME: bio-based shouldn't need to always disable queue_if_no_path */
if (m->queue_mode == DM_TYPE_BIO_BASED || !dm_noflush_suspending(m->ti))
queue_if_no_path(m, false, true, __func__);
} }
static void multipath_postsuspend(struct dm_target *ti) static void multipath_postsuspend(struct dm_target *ti)
...@@ -1698,8 +1713,16 @@ static void multipath_resume(struct dm_target *ti) ...@@ -1698,8 +1713,16 @@ static void multipath_resume(struct dm_target *ti)
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&m->lock, flags); spin_lock_irqsave(&m->lock, flags);
assign_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags, if (test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) {
set_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
clear_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags);
}
DMDEBUG("%s: %s finished; QIFNP = %d; SQIFNP = %d",
dm_device_name(dm_table_get_md(m->ti->table)), __func__,
test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags),
test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)); test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags));
spin_unlock_irqrestore(&m->lock, flags); spin_unlock_irqrestore(&m->lock, flags);
} }
...@@ -1859,13 +1882,13 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv, ...@@ -1859,13 +1882,13 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv,
if (argc == 1) { if (argc == 1) {
if (!strcasecmp(argv[0], "queue_if_no_path")) { if (!strcasecmp(argv[0], "queue_if_no_path")) {
r = queue_if_no_path(m, true, false); r = queue_if_no_path(m, true, false, __func__);
spin_lock_irqsave(&m->lock, flags); spin_lock_irqsave(&m->lock, flags);
enable_nopath_timeout(m); enable_nopath_timeout(m);
spin_unlock_irqrestore(&m->lock, flags); spin_unlock_irqrestore(&m->lock, flags);
goto out; goto out;
} else if (!strcasecmp(argv[0], "fail_if_no_path")) { } else if (!strcasecmp(argv[0], "fail_if_no_path")) {
r = queue_if_no_path(m, false, false); r = queue_if_no_path(m, false, false, __func__);
disable_nopath_timeout(m); disable_nopath_timeout(m);
goto out; goto out;
} }
...@@ -1918,7 +1941,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, ...@@ -1918,7 +1941,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti,
int r; int r;
current_pgpath = READ_ONCE(m->current_pgpath); current_pgpath = READ_ONCE(m->current_pgpath);
if (!current_pgpath) if (!current_pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags))
current_pgpath = choose_pgpath(m, 0); current_pgpath = choose_pgpath(m, 0);
if (current_pgpath) { if (current_pgpath) {
......
...@@ -74,7 +74,7 @@ struct path_selector_type { ...@@ -74,7 +74,7 @@ struct path_selector_type {
int (*start_io) (struct path_selector *ps, struct dm_path *path, int (*start_io) (struct path_selector *ps, struct dm_path *path,
size_t nr_bytes); size_t nr_bytes);
int (*end_io) (struct path_selector *ps, struct dm_path *path, int (*end_io) (struct path_selector *ps, struct dm_path *path,
size_t nr_bytes); size_t nr_bytes, u64 start_time);
}; };
/* Register a path selector */ /* Register a path selector */
......
...@@ -227,7 +227,7 @@ static int ql_start_io(struct path_selector *ps, struct dm_path *path, ...@@ -227,7 +227,7 @@ static int ql_start_io(struct path_selector *ps, struct dm_path *path,
} }
static int ql_end_io(struct path_selector *ps, struct dm_path *path, static int ql_end_io(struct path_selector *ps, struct dm_path *path,
size_t nr_bytes) size_t nr_bytes, u64 start_time)
{ {
struct path_info *pi = path->pscontext; struct path_info *pi = path->pscontext;
......
...@@ -254,7 +254,7 @@ struct raid_set { ...@@ -254,7 +254,7 @@ struct raid_set {
int mode; int mode;
} journal_dev; } journal_dev;
struct raid_dev dev[0]; struct raid_dev dev[];
}; };
static void rs_config_backup(struct raid_set *rs, struct rs_layout *l) static void rs_config_backup(struct raid_set *rs, struct rs_layout *l)
......
...@@ -83,7 +83,7 @@ struct mirror_set { ...@@ -83,7 +83,7 @@ struct mirror_set {
struct work_struct trigger_event; struct work_struct trigger_event;
unsigned nr_mirrors; unsigned nr_mirrors;
struct mirror mirror[0]; struct mirror mirror[];
}; };
DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(raid1_resync_throttle, DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(raid1_resync_throttle,
......
...@@ -309,7 +309,7 @@ static int st_start_io(struct path_selector *ps, struct dm_path *path, ...@@ -309,7 +309,7 @@ static int st_start_io(struct path_selector *ps, struct dm_path *path,
} }
static int st_end_io(struct path_selector *ps, struct dm_path *path, static int st_end_io(struct path_selector *ps, struct dm_path *path,
size_t nr_bytes) size_t nr_bytes, u64 start_time)
{ {
struct path_info *pi = path->pscontext; struct path_info *pi = path->pscontext;
......
...@@ -56,7 +56,7 @@ struct dm_stat { ...@@ -56,7 +56,7 @@ struct dm_stat {
size_t percpu_alloc_size; size_t percpu_alloc_size;
size_t histogram_alloc_size; size_t histogram_alloc_size;
struct dm_stat_percpu *stat_percpu[NR_CPUS]; struct dm_stat_percpu *stat_percpu[NR_CPUS];
struct dm_stat_shared stat_shared[0]; struct dm_stat_shared stat_shared[];
}; };
#define STAT_PRECISE_TIMESTAMPS 1 #define STAT_PRECISE_TIMESTAMPS 1
......
...@@ -41,7 +41,7 @@ struct stripe_c { ...@@ -41,7 +41,7 @@ struct stripe_c {
/* Work struct used for triggering events*/ /* Work struct used for triggering events*/
struct work_struct trigger_event; struct work_struct trigger_event;
struct stripe stripe[0]; struct stripe stripe[];
}; };
/* /*
......
...@@ -53,7 +53,7 @@ struct switch_ctx { ...@@ -53,7 +53,7 @@ struct switch_ctx {
/* /*
* Array of dm devices to switch between. * Array of dm devices to switch between.
*/ */
struct switch_path path_list[0]; struct switch_path path_list[];
}; };
static struct switch_ctx *alloc_switch_ctx(struct dm_target *ti, unsigned nr_paths, static struct switch_ctx *alloc_switch_ctx(struct dm_target *ti, unsigned nr_paths,
......
...@@ -234,10 +234,6 @@ static int persistent_memory_claim(struct dm_writecache *wc) ...@@ -234,10 +234,6 @@ static int persistent_memory_claim(struct dm_writecache *wc)
wc->memory_vmapped = false; wc->memory_vmapped = false;
if (!wc->ssd_dev->dax_dev) {
r = -EOPNOTSUPP;
goto err1;
}
s = wc->memory_map_size; s = wc->memory_map_size;
p = s >> PAGE_SHIFT; p = s >> PAGE_SHIFT;
if (!p) { if (!p) {
...@@ -1143,6 +1139,42 @@ static int writecache_message(struct dm_target *ti, unsigned argc, char **argv, ...@@ -1143,6 +1139,42 @@ static int writecache_message(struct dm_target *ti, unsigned argc, char **argv,
return r; return r;
} }
static void memcpy_flushcache_optimized(void *dest, void *source, size_t size)
{
/*
* clflushopt performs better with block size 1024, 2048, 4096
* non-temporal stores perform better with block size 512
*
* block size 512 1024 2048 4096
* movnti 496 MB/s 642 MB/s 725 MB/s 744 MB/s
* clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s
*
* We see that movnti performs better for 512-byte blocks, and
* clflushopt performs better for 1024-byte and larger blocks. So, we
* prefer clflushopt for sizes >= 768.
*
* NOTE: this happens to be the case now (with dm-writecache's single
* threaded model) but re-evaluate this once memcpy_flushcache() is
* enabled to use movdir64b which might invalidate this performance
* advantage seen with cache-allocating-writes plus flushing.
*/
#ifdef CONFIG_X86
if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) &&
likely(boot_cpu_data.x86_clflush_size == 64) &&
likely(size >= 768)) {
do {
memcpy((void *)dest, (void *)source, 64);
clflushopt((void *)dest);
dest += 64;
source += 64;
size -= 64;
} while (size >= 64);
return;
}
#endif
memcpy_flushcache(dest, source, size);
}
static void bio_copy_block(struct dm_writecache *wc, struct bio *bio, void *data) static void bio_copy_block(struct dm_writecache *wc, struct bio *bio, void *data)
{ {
void *buf; void *buf;
...@@ -1168,7 +1200,7 @@ static void bio_copy_block(struct dm_writecache *wc, struct bio *bio, void *data ...@@ -1168,7 +1200,7 @@ static void bio_copy_block(struct dm_writecache *wc, struct bio *bio, void *data
} }
} else { } else {
flush_dcache_page(bio_page(bio)); flush_dcache_page(bio_page(bio));
memcpy_flushcache(data, buf, size); memcpy_flushcache_optimized(data, buf, size);
} }
bvec_kunmap_irq(buf, &flags); bvec_kunmap_irq(buf, &flags);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment