dm: add persistent data library

The persistent-data library offers a re-usable framework for the storage and management of on-disk metadata in device-mapper targets. It's used by the thin-provisioning target in the next patch and in an upcoming hierarchical storage target. For further information, please read Documentation/device-mapper/persistent-data.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: add persistent data library
The persistent-data library offers a re-usable framework for the storage and management of on-disk metadata in device-mapper targets. It's used by the thin-provisioning target in the next patch and in an upcoming hierarchical storage target. For further information, please read Documentation/device-mapper/persistent-data.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
3241b1d3 · Joe Thornber · Alasdair G Kergon · 95d402f0 · 3241b1d3 · 3241b1d3
Commit 3241b1d3 authored Oct 31, 2011 by Joe Thornber Committed by Alasdair G Kergon Oct 31, 2011
22 changed files
--- a/Documentation/device-mapper/persistent-data.txt
+++ b/Documentation/device-mapper/persistent-data.txt
+Introduction
+============
+
+The more-sophisticated device-mapper targets require complex metadata
+that is managed in kernel.  In late 2010 we were seeing that various
+different targets were rolling their own data strutures, for example:
+
+- Mikulas Patocka's multisnap implementation
+- Heinz Mauelshagen's thin provisioning target
+- Another btree-based caching target posted to dm-devel
+- Another multi-snapshot target based on a design of Daniel Phillips
+
+Maintaining these data structures takes a lot of work, so if possible
+we'd like to reduce the number.
+
+The persistent-data library is an attempt to provide a re-usable
+framework for people who want to store metadata in device-mapper
+targets.  It's currently used by the thin-provisioning target and an
+upcoming hierarchical storage target.
+
+Overview
+========
+
+The main documentation is in the header files which can all be found
+under drivers/md/persistent-data.
+
+The block manager
+-----------------
+
+dm-block-manager.[hc]
+
+This provides access to the data on disk in fixed sized-blocks.  There
+is a read/write locking interface to prevent concurrent accesses, and
+keep data that is being used in the cache.
+
+Clients of persistent-data are unlikely to use this directly.
+
+The transaction manager
+-----------------------
+
+dm-transaction-manager.[hc]
+
+This restricts access to blocks and enforces copy-on-write semantics.
+The only way you can get hold of a writable block through the
+transaction manager is by shadowing an existing block (ie. doing
+copy-on-write) or allocating a fresh one.  Shadowing is elided within
+the same transaction so performance is reasonable.  The commit method
+ensures that all data is flushed before it writes the superblock.
+On power failure your metadata will be as it was when last committed.
+
+The Space Maps
+--------------
+
+dm-space-map.h
+dm-space-map-metadata.[hc]
+dm-space-map-disk.[hc]
+
+On-disk data structures that keep track of reference counts of blocks.
+Also acts as the allocator of new blocks.  Currently two
+implementations: a simpler one for managing blocks on a different
+device (eg. thinly-provisioned data blocks); and one for managing
+the metadata space.  The latter is complicated by the need to store
+its own data within the space it's managing.
+
+The data structures
+-------------------
+
+dm-btree.[hc]
+dm-btree-remove.c
+dm-btree-spine.c
+dm-btree-internal.h
+
+Currently there is only one data structure, a hierarchical btree.
+There are plans to add more.  For example, something with an
+array-like interface would see a lot of use.
+
+The btree is 'hierarchical' in that you can define it to be composed
+of nested btrees, and take multiple keys.  For example, the
+thin-provisioning target uses a btree with two levels of nesting.
+The first maps a device id to a mapping tree, and that in turn maps a
+virtual block to a physical block.
+
+Values stored in the btrees can have arbitrary size.  Keys are always
+64bits, although nesting allows you to use multiple keys.
--- a/drivers/md/persistent-data/Kconfig
+++ b/drivers/md/persistent-data/Kconfig
+config DM_PERSISTENT_DATA
+       tristate
+       depends on BLK_DEV_DM && EXPERIMENTAL
+       select LIBCRC32C
+       select DM_BUFIO
+       ---help---
+	 Library providing immutable on-disk data structure support for
+	 device-mapper targets such as the thin provisioning target.
--- a/drivers/md/persistent-data/Makefile
+++ b/drivers/md/persistent-data/Makefile
+obj-$(CONFIG_DM_PERSISTENT_DATA) += dm-persistent-data.o
+dm-persistent-data-objs := \
+	dm-block-manager.o \
+	dm-space-map-checker.o \
+	dm-space-map-common.o \
+	dm-space-map-disk.o \
+	dm-space-map-metadata.o \
+	dm-transaction-manager.o \
+	dm-btree.o \
+	dm-btree-remove.o \
+	dm-btree-spine.o
--- a/drivers/md/persistent-data/dm-block-manager.c
+++ b/drivers/md/persistent-data/dm-block-manager.c
--- a/drivers/md/persistent-data/dm-block-manager.h
+++ b/drivers/md/persistent-data/dm-block-manager.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef _LINUX_DM_BLOCK_MANAGER_H
+#define _LINUX_DM_BLOCK_MANAGER_H
+
+#include <linux/types.h>
+#include <linux/blkdev.h>
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Block number.
+ */
+typedef uint64_t dm_block_t;
+struct dm_block;
+
+dm_block_t dm_block_location(struct dm_block *b);
+void *dm_block_data(struct dm_block *b);
+
+/*----------------------------------------------------------------*/
+
+/*
+ * @name should be a unique identifier for the block manager, no longer
+ * than 32 chars.
+ *
+ * @max_held_per_thread should be the maximum number of locks, read or
+ * write, that an individual thread holds at any one time.
+ */
+struct dm_block_manager;
+struct dm_block_manager *dm_block_manager_create(
+	struct block_device *bdev, unsigned block_size,
+	unsigned cache_size, unsigned max_held_per_thread);
+void dm_block_manager_destroy(struct dm_block_manager *bm);
+
+unsigned dm_bm_block_size(struct dm_block_manager *bm);
+dm_block_t dm_bm_nr_blocks(struct dm_block_manager *bm);
+
+/*----------------------------------------------------------------*/
+
+/*
+ * The validator allows the caller to verify newly-read data and modify
+ * the data just before writing, e.g. to calculate checksums.  It's
+ * important to be consistent with your use of validators.  The only time
+ * you can change validators is if you call dm_bm_write_lock_zero.
+ */
+struct dm_block_validator {
+	const char *name;
+	void (*prepare_for_write)(struct dm_block_validator *v, struct dm_block *b, size_t block_size);
+
+	/*
+	 * Return 0 if the checksum is valid or < 0 on error.
+	 */
+	int (*check)(struct dm_block_validator *v, struct dm_block *b, size_t block_size);
+};
+
+/*----------------------------------------------------------------*/
+
+/*
+ * You can have multiple concurrent readers or a single writer holding a
+ * block lock.
+ */
+
+/*
+ * dm_bm_lock() locks a block and returns through @result a pointer to
+ * memory that holds a copy of that block.  If you have write-locked the
+ * block then any changes you make to memory pointed to by @result will be
+ * written back to the disk sometime after dm_bm_unlock is called.
+ */
+int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b,
+		    struct dm_block_validator *v,
+		    struct dm_block **result);
+
+int dm_bm_write_lock(struct dm_block_manager *bm, dm_block_t b,
+		     struct dm_block_validator *v,
+		     struct dm_block **result);
+
+/*
+ * The *_try_lock variants return -EWOULDBLOCK if the block isn't
+ * available immediately.
+ */
+int dm_bm_read_try_lock(struct dm_block_manager *bm, dm_block_t b,
+			struct dm_block_validator *v,
+			struct dm_block **result);
+
+/*
+ * Use dm_bm_write_lock_zero() when you know you're going to
+ * overwrite the block completely.  It saves a disk read.
+ */
+int dm_bm_write_lock_zero(struct dm_block_manager *bm, dm_block_t b,
+			  struct dm_block_validator *v,
+			  struct dm_block **result);
+
+int dm_bm_unlock(struct dm_block *b);
+
+/*
+ * An optimisation; we often want to copy a block's contents to a new
+ * block.  eg, as part of the shadowing operation.  It's far better for
+ * bufio to do this move behind the scenes than hold 2 locks and memcpy the
+ * data.
+ */
+int dm_bm_unlock_move(struct dm_block *b, dm_block_t n);
+
+/*
+ * It's a common idiom to have a superblock that should be committed last.
+ *
+ * @superblock should be write-locked on entry. It will be unlocked during
+ * this function.  All dirty blocks are guaranteed to be written and flushed
+ * before the superblock.
+ *
+ * This method always blocks.
+ */
+int dm_bm_flush_and_unlock(struct dm_block_manager *bm,
+			   struct dm_block *superblock);
+
+u32 dm_bm_checksum(const void *data, size_t len, u32 init_xor);
+
+/*----------------------------------------------------------------*/
+
+#endif	/* _LINUX_DM_BLOCK_MANAGER_H */
--- a/drivers/md/persistent-data/dm-btree-internal.h
+++ b/drivers/md/persistent-data/dm-btree-internal.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef DM_BTREE_INTERNAL_H
+#define DM_BTREE_INTERNAL_H
+
+#include "dm-btree.h"
+
+/*----------------------------------------------------------------*/
+
+/*
+ * We'll need 2 accessor functions for n->csum and n->blocknr
+ * to support dm-btree-spine.c in that case.
+ */
+
+enum node_flags {
+	INTERNAL_NODE = 1,
+	LEAF_NODE = 1 << 1
+};
+
+/*
+ * Every btree node begins with this structure.  Make sure it's a multiple
+ * of 8-bytes in size, otherwise the 64bit keys will be mis-aligned.
+ */
+struct node_header {
+	__le32 csum;
+	__le32 flags;
+	__le64 blocknr; /* Block this node is supposed to live in. */
+
+	__le32 nr_entries;
+	__le32 max_entries;
+	__le32 value_size;
+	__le32 padding;
+} __packed;
+
+struct node {
+	struct node_header header;
+	__le64 keys[0];
+} __packed;
+
+
+void inc_children(struct dm_transaction_manager *tm, struct node *n,
+		  struct dm_btree_value_type *vt);
+
+int new_block(struct dm_btree_info *info, struct dm_block **result);
+int unlock_block(struct dm_btree_info *info, struct dm_block *b);
+
+/*
+ * Spines keep track of the rolling locks.  There are 2 variants, read-only
+ * and one that uses shadowing.  These are separate structs to allow the
+ * type checker to spot misuse, for example accidentally calling read_lock
+ * on a shadow spine.
+ */
+struct ro_spine {
+	struct dm_btree_info *info;
+
+	int count;
+	struct dm_block *nodes[2];
+};
+
+void init_ro_spine(struct ro_spine *s, struct dm_btree_info *info);
+int exit_ro_spine(struct ro_spine *s);
+int ro_step(struct ro_spine *s, dm_block_t new_child);
+struct node *ro_node(struct ro_spine *s);
+
+struct shadow_spine {
+	struct dm_btree_info *info;
+
+	int count;
+	struct dm_block *nodes[2];
+
+	dm_block_t root;
+};
+
+void init_shadow_spine(struct shadow_spine *s, struct dm_btree_info *info);
+int exit_shadow_spine(struct shadow_spine *s);
+
+int shadow_step(struct shadow_spine *s, dm_block_t b,
+		struct dm_btree_value_type *vt);
+
+/*
+ * The spine must have at least one entry before calling this.
+ */
+struct dm_block *shadow_current(struct shadow_spine *s);
+
+/*
+ * The spine must have at least two entries before calling this.
+ */
+struct dm_block *shadow_parent(struct shadow_spine *s);
+
+int shadow_has_parent(struct shadow_spine *s);
+
+int shadow_root(struct shadow_spine *s);
+
+/*
+ * Some inlines.
+ */
+static inline __le64 *key_ptr(struct node *n, uint32_t index)
+{
+	return n->keys + index;
+}
+
+static inline void *value_base(struct node *n)
+{
+	return &n->keys[le32_to_cpu(n->header.max_entries)];
+}
+
+/*
+ * FIXME: Now that value size is stored in node we don't need the third parm.
+ */
+static inline void *value_ptr(struct node *n, uint32_t index, size_t value_size)
+{
+	BUG_ON(value_size != le32_to_cpu(n->header.value_size));
+	return value_base(n) + (value_size * index);
+}
+
+/*
+ * Assumes the values are suitably-aligned and converts to core format.
+ */
+static inline uint64_t value64(struct node *n, uint32_t index)
+{
+	__le64 *values_le = value_base(n);
+
+	return le64_to_cpu(values_le[index]);
+}
+
+/*
+ * Searching for a key within a single node.
+ */
+int lower_bound(struct node *n, uint64_t key);
+
+extern struct dm_block_validator btree_node_validator;
+
+#endif	/* DM_BTREE_INTERNAL_H */
--- a/drivers/md/persistent-data/dm-btree-remove.c
+++ b/drivers/md/persistent-data/dm-btree-remove.c
--- a/drivers/md/persistent-data/dm-btree-spine.c
+++ b/drivers/md/persistent-data/dm-btree-spine.c
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#include "dm-btree-internal.h"
+#include "dm-transaction-manager.h"
+
+#include <linux/device-mapper.h>
+
+#define DM_MSG_PREFIX "btree spine"
+
+/*----------------------------------------------------------------*/
+
+#define BTREE_CSUM_XOR 121107
+
+static int node_check(struct dm_block_validator *v,
+		      struct dm_block *b,
+		      size_t block_size);
+
+static void node_prepare_for_write(struct dm_block_validator *v,
+				   struct dm_block *b,
+				   size_t block_size)
+{
+	struct node *n = dm_block_data(b);
+	struct node_header *h = &n->header;
+
+	h->blocknr = cpu_to_le64(dm_block_location(b));
+	h->csum = cpu_to_le32(dm_bm_checksum(&h->flags,
+					     block_size - sizeof(__le32),
+					     BTREE_CSUM_XOR));
+
+	BUG_ON(node_check(v, b, 4096));
+}
+
+static int node_check(struct dm_block_validator *v,
+		      struct dm_block *b,
+		      size_t block_size)
+{
+	struct node *n = dm_block_data(b);
+	struct node_header *h = &n->header;
+	size_t value_size;
+	__le32 csum_disk;
+	uint32_t flags;
+
+	if (dm_block_location(b) != le64_to_cpu(h->blocknr)) {
+		DMERR("node_check failed blocknr %llu wanted %llu",
+		      le64_to_cpu(h->blocknr), dm_block_location(b));
+		return -ENOTBLK;
+	}
+
+	csum_disk = cpu_to_le32(dm_bm_checksum(&h->flags,
+					       block_size - sizeof(__le32),
+					       BTREE_CSUM_XOR));
+	if (csum_disk != h->csum) {
+		DMERR("node_check failed csum %u wanted %u",
+		      le32_to_cpu(csum_disk), le32_to_cpu(h->csum));
+		return -EILSEQ;
+	}
+
+	value_size = le32_to_cpu(h->value_size);
+
+	if (sizeof(struct node_header) +
+	    (sizeof(__le64) + value_size) * le32_to_cpu(h->max_entries) > block_size) {
+		DMERR("node_check failed: max_entries too large");
+		return -EILSEQ;
+	}
+
+	if (le32_to_cpu(h->nr_entries) > le32_to_cpu(h->max_entries)) {
+		DMERR("node_check failed, too many entries");
+		return -EILSEQ;
+	}
+
+	/*
+	 * The node must be either INTERNAL or LEAF.
+	 */
+	flags = le32_to_cpu(h->flags);
+	if (!(flags & INTERNAL_NODE) && !(flags & LEAF_NODE)) {
+		DMERR("node_check failed, node is neither INTERNAL or LEAF");
+		return -EILSEQ;
+	}
+
+	return 0;
+}
+
+struct dm_block_validator btree_node_validator = {
+	.name = "btree_node",
+	.prepare_for_write = node_prepare_for_write,
+	.check = node_check
+};
+
+/*----------------------------------------------------------------*/
+
+static int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
+		 struct dm_block **result)
+{
+	return dm_tm_read_lock(info->tm, b, &btree_node_validator, result);
+}
+
+static int bn_shadow(struct dm_btree_info *info, dm_block_t orig,
+	      struct dm_btree_value_type *vt,
+	      struct dm_block **result)
+{
+	int r, inc;
+
+	r = dm_tm_shadow_block(info->tm, orig, &btree_node_validator,
+			       result, &inc);
+	if (!r && inc)
+		inc_children(info->tm, dm_block_data(*result), vt);
+
+	return r;
+}
+
+int new_block(struct dm_btree_info *info, struct dm_block **result)
+{
+	return dm_tm_new_block(info->tm, &btree_node_validator, result);
+}
+
+int unlock_block(struct dm_btree_info *info, struct dm_block *b)
+{
+	return dm_tm_unlock(info->tm, b);
+}
+
+/*----------------------------------------------------------------*/
+
+void init_ro_spine(struct ro_spine *s, struct dm_btree_info *info)
+{
+	s->info = info;
+	s->count = 0;
+	s->nodes[0] = NULL;
+	s->nodes[1] = NULL;
+}
+
+int exit_ro_spine(struct ro_spine *s)
+{
+	int r = 0, i;
+
+	for (i = 0; i < s->count; i++) {
+		int r2 = unlock_block(s->info, s->nodes[i]);
+		if (r2 < 0)
+			r = r2;
+	}
+
+	return r;
+}
+
+int ro_step(struct ro_spine *s, dm_block_t new_child)
+{
+	int r;
+
+	if (s->count == 2) {
+		r = unlock_block(s->info, s->nodes[0]);
+		if (r < 0)
+			return r;
+		s->nodes[0] = s->nodes[1];
+		s->count--;
+	}
+
+	r = bn_read_lock(s->info, new_child, s->nodes + s->count);
+	if (!r)
+		s->count++;
+
+	return r;
+}
+
+struct node *ro_node(struct ro_spine *s)
+{
+	struct dm_block *block;
+
+	BUG_ON(!s->count);
+	block = s->nodes[s->count - 1];
+
+	return dm_block_data(block);
+}
+
+/*----------------------------------------------------------------*/
+
+void init_shadow_spine(struct shadow_spine *s, struct dm_btree_info *info)
+{
+	s->info = info;
+	s->count = 0;
+}
+
+int exit_shadow_spine(struct shadow_spine *s)
+{
+	int r = 0, i;
+
+	for (i = 0; i < s->count; i++) {
+		int r2 = unlock_block(s->info, s->nodes[i]);
+		if (r2 < 0)
+			r = r2;
+	}
+
+	return r;
+}
+
+int shadow_step(struct shadow_spine *s, dm_block_t b,
+		struct dm_btree_value_type *vt)
+{
+	int r;
+
+	if (s->count == 2) {
+		r = unlock_block(s->info, s->nodes[0]);
+		if (r < 0)
+			return r;
+		s->nodes[0] = s->nodes[1];
+		s->count--;
+	}
+
+	r = bn_shadow(s->info, b, vt, s->nodes + s->count);
+	if (!r) {
+		if (!s->count)
+			s->root = dm_block_location(s->nodes[0]);
+
+		s->count++;
+	}
+
+	return r;
+}
+
+struct dm_block *shadow_current(struct shadow_spine *s)
+{
+	BUG_ON(!s->count);
+
+	return s->nodes[s->count - 1];
+}
+
+struct dm_block *shadow_parent(struct shadow_spine *s)
+{
+	BUG_ON(s->count != 2);
+
+	return s->count == 2 ? s->nodes[0] : NULL;
+}
+
+int shadow_has_parent(struct shadow_spine *s)
+{
+	return s->count >= 2;
+}
+
+int shadow_root(struct shadow_spine *s)
+{
+	return s->root;
+}
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
--- a/drivers/md/persistent-data/dm-btree.h
+++ b/drivers/md/persistent-data/dm-btree.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+#ifndef _LINUX_DM_BTREE_H
+#define _LINUX_DM_BTREE_H
+
+#include "dm-block-manager.h"
+
+struct dm_transaction_manager;
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Annotations used to check on-disk metadata is handled as little-endian.
+ */
+#ifdef __CHECKER__
+#  define __dm_written_to_disk(x) __releases(x)
+#  define __dm_reads_from_disk(x) __acquires(x)
+#  define __dm_bless_for_disk(x) __acquire(x)
+#  define __dm_unbless_for_disk(x) __release(x)
+#else
+#  define __dm_written_to_disk(x)
+#  define __dm_reads_from_disk(x)
+#  define __dm_bless_for_disk(x)
+#  define __dm_unbless_for_disk(x)
+#endif
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Manipulates hierarchical B+ trees with 64-bit keys and arbitrary-sized
+ * values.
+ */
+
+/*
+ * Infomation about the values stored within the btree.
+ */
+struct dm_btree_value_type {
+	void *context;
+
+	/*
+	 * The size in bytes of each value.
+	 */
+	uint32_t size;
+
+	/*
+	 * Any of these methods can be safely set to NULL if you do not
+	 * need the corresponding feature.
+	 */
+
+	/*
+	 * The btree is making a duplicate of the value, for instance
+	 * because previously-shared btree nodes have now diverged.
+	 * @value argument is the new copy that the copy function may modify.
+	 * (Probably it just wants to increment a reference count
+	 * somewhere.) This method is _not_ called for insertion of a new
+	 * value: It is assumed the ref count is already 1.
+	 */
+	void (*inc)(void *context, void *value);
+
+	/*
+	 * This value is being deleted.  The btree takes care of freeing
+	 * the memory pointed to by @value.  Often the del function just
+	 * needs to decrement a reference count somewhere.
+	 */
+	void (*dec)(void *context, void *value);
+
+	/*
+	 * A test for equality between two values.  When a value is
+	 * overwritten with a new one, the old one has the dec method
+	 * called _unless_ the new and old value are deemed equal.
+	 */
+	int (*equal)(void *context, void *value1, void *value2);
+};
+
+/*
+ * The shape and contents of a btree.
+ */
+struct dm_btree_info {
+	struct dm_transaction_manager *tm;
+
+	/*
+	 * Number of nested btrees. (Not the depth of a single tree.)
+	 */
+	unsigned levels;
+	struct dm_btree_value_type value_type;
+};
+
+/*
+ * Set up an empty tree.  O(1).
+ */
+int dm_btree_empty(struct dm_btree_info *info, dm_block_t *root);
+
+/*
+ * Delete a tree.  O(n) - this is the slow one!  It can also block, so
+ * please don't call it on an IO path.
+ */
+int dm_btree_del(struct dm_btree_info *info, dm_block_t root);
+
+/*
+ * All the lookup functions return -ENODATA if the key cannot be found.
+ */
+
+/*
+ * Tries to find a key that matches exactly.  O(ln(n))
+ */
+int dm_btree_lookup(struct dm_btree_info *info, dm_block_t root,
+		    uint64_t *keys, void *value_le);
+
+/*
+ * Insertion (or overwrite an existing value).  O(ln(n))
+ */
+int dm_btree_insert(struct dm_btree_info *info, dm_block_t root,
+		    uint64_t *keys, void *value, dm_block_t *new_root)
+		    __dm_written_to_disk(value);
+
+/*
+ * A variant of insert that indicates whether it actually inserted or just
+ * overwrote.  Useful if you're keeping track of the number of entries in a
+ * tree.
+ */
+int dm_btree_insert_notify(struct dm_btree_info *info, dm_block_t root,
+			   uint64_t *keys, void *value, dm_block_t *new_root,
+			   int *inserted)
+			   __dm_written_to_disk(value);
+
+/*
+ * Remove a key if present.  This doesn't remove empty sub trees.  Normally
+ * subtrees represent a separate entity, like a snapshot map, so this is
+ * correct behaviour.  O(ln(n)).
+ */
+int dm_btree_remove(struct dm_btree_info *info, dm_block_t root,
+		    uint64_t *keys, dm_block_t *new_root);
+
+/*
+ * Returns < 0 on failure.  Otherwise the number of key entries that have
+ * been filled out.  Remember trees can have zero entries, and as such have
+ * no highest key.
+ */
+int dm_btree_find_highest_key(struct dm_btree_info *info, dm_block_t root,
+			      uint64_t *result_keys);
+
+#endif	/* _LINUX_DM_BTREE_H */
--- a/drivers/md/persistent-data/dm-persistent-data-internal.h
+++ b/drivers/md/persistent-data/dm-persistent-data-internal.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef _DM_PERSISTENT_DATA_INTERNAL_H
+#define _DM_PERSISTENT_DATA_INTERNAL_H
+
+#include "dm-block-manager.h"
+
+static inline unsigned dm_hash_block(dm_block_t b, unsigned hash_mask)
+{
+	const unsigned BIG_PRIME = 4294967291UL;
+
+	return (((unsigned) b) * BIG_PRIME) & hash_mask;
+}
+
+#endif	/* _PERSISTENT_DATA_INTERNAL_H */
--- a/drivers/md/persistent-data/dm-space-map-checker.c
+++ b/drivers/md/persistent-data/dm-space-map-checker.c
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#include "dm-space-map-checker.h"
+
+#include <linux/device-mapper.h>
+
+#ifdef CONFIG_DM_DEBUG_SPACE_MAPS
+
+#define DM_MSG_PREFIX "space map checker"
+
+/*----------------------------------------------------------------*/
+
+struct count_array {
+	dm_block_t nr;
+	dm_block_t nr_free;
+
+	uint32_t *counts;
+};
+
+static int ca_get_count(struct count_array *ca, dm_block_t b, uint32_t *count)
+{
+	if (b >= ca->nr)
+		return -EINVAL;
+
+	*count = ca->counts[b];
+	return 0;
+}
+
+static int ca_count_more_than_one(struct count_array *ca, dm_block_t b, int *r)
+{
+	if (b >= ca->nr)
+		return -EINVAL;
+
+	*r = ca->counts[b] > 1;
+	return 0;
+}
+
+static int ca_set_count(struct count_array *ca, dm_block_t b, uint32_t count)
+{
+	uint32_t old_count;
+
+	if (b >= ca->nr)
+		return -EINVAL;
+
+	old_count = ca->counts[b];
+
+	if (!count && old_count)
+		ca->nr_free++;
+
+	else if (count && !old_count)
+		ca->nr_free--;
+
+	ca->counts[b] = count;
+	return 0;
+}
+
+static int ca_inc_block(struct count_array *ca, dm_block_t b)
+{
+	if (b >= ca->nr)
+		return -EINVAL;
+
+	ca_set_count(ca, b, ca->counts[b] + 1);
+	return 0;
+}
+
+static int ca_dec_block(struct count_array *ca, dm_block_t b)
+{
+	if (b >= ca->nr)
+		return -EINVAL;
+
+	BUG_ON(ca->counts[b] == 0);
+	ca_set_count(ca, b, ca->counts[b] - 1);
+	return 0;
+}
+
+static int ca_create(struct count_array *ca, struct dm_space_map *sm)
+{
+	int r;
+	dm_block_t nr_blocks;
+
+	r = dm_sm_get_nr_blocks(sm, &nr_blocks);
+	if (r)
+		return r;
+
+	ca->nr = nr_blocks;
+	ca->nr_free = nr_blocks;
+	ca->counts = kzalloc(sizeof(*ca->counts) * nr_blocks, GFP_KERNEL);
+	if (!ca->counts)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int ca_load(struct count_array *ca, struct dm_space_map *sm)
+{
+	int r;
+	uint32_t count;
+	dm_block_t nr_blocks, i;
+
+	r = dm_sm_get_nr_blocks(sm, &nr_blocks);
+	if (r)
+		return r;
+
+	BUG_ON(ca->nr != nr_blocks);
+
+	DMWARN("Loading debug space map from disk.  This may take some time");
+	for (i = 0; i < nr_blocks; i++) {
+		r = dm_sm_get_count(sm, i, &count);
+		if (r) {
+			DMERR("load failed");
+			return r;
+		}
+
+		ca_set_count(ca, i, count);
+	}
+	DMWARN("Load complete");
+
+	return 0;
+}
+
+static int ca_extend(struct count_array *ca, dm_block_t extra_blocks)
+{
+	dm_block_t nr_blocks = ca->nr + extra_blocks;
+	uint32_t *counts = kzalloc(sizeof(*counts) * nr_blocks, GFP_KERNEL);
+	if (!counts)
+		return -ENOMEM;
+
+	memcpy(counts, ca->counts, sizeof(*counts) * ca->nr);
+	kfree(ca->counts);
+	ca->nr = nr_blocks;
+	ca->nr_free += extra_blocks;
+	ca->counts = counts;
+	return 0;
+}
+
+static int ca_commit(struct count_array *old, struct count_array *new)
+{
+	if (old->nr != new->nr) {
+		BUG_ON(old->nr > new->nr);
+		ca_extend(old, new->nr - old->nr);
+	}
+
+	BUG_ON(old->nr != new->nr);
+	old->nr_free = new->nr_free;
+	memcpy(old->counts, new->counts, sizeof(*old->counts) * old->nr);
+	return 0;
+}
+
+static void ca_destroy(struct count_array *ca)
+{
+	kfree(ca->counts);
+}
+
+/*----------------------------------------------------------------*/
+
+struct sm_checker {
+	struct dm_space_map sm;
+
+	struct count_array old_counts;
+	struct count_array counts;
+
+	struct dm_space_map *real_sm;
+};
+
+static void sm_checker_destroy(struct dm_space_map *sm)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+
+	dm_sm_destroy(smc->real_sm);
+	ca_destroy(&smc->old_counts);
+	ca_destroy(&smc->counts);
+	kfree(smc);
+}
+
+static int sm_checker_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_get_nr_blocks(smc->real_sm, count);
+	if (!r)
+		BUG_ON(smc->old_counts.nr != *count);
+	return r;
+}
+
+static int sm_checker_get_nr_free(struct dm_space_map *sm, dm_block_t *count)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_get_nr_free(smc->real_sm, count);
+	if (!r) {
+		/*
+		 * Slow, but we know it's correct.
+		 */
+		dm_block_t b, n = 0;
+		for (b = 0; b < smc->old_counts.nr; b++)
+			if (smc->old_counts.counts[b] == 0 &&
+			    smc->counts.counts[b] == 0)
+				n++;
+
+		if (n != *count)
+			DMERR("free block counts differ, checker %u, sm-disk:%u",
+			      (unsigned) n, (unsigned) *count);
+	}
+	return r;
+}
+
+static int sm_checker_new_block(struct dm_space_map *sm, dm_block_t *b)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_new_block(smc->real_sm, b);
+
+	if (!r) {
+		BUG_ON(*b >= smc->old_counts.nr);
+		BUG_ON(smc->old_counts.counts[*b] != 0);
+		BUG_ON(*b >= smc->counts.nr);
+		BUG_ON(smc->counts.counts[*b] != 0);
+		ca_set_count(&smc->counts, *b, 1);
+	}
+
+	return r;
+}
+
+static int sm_checker_inc_block(struct dm_space_map *sm, dm_block_t b)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_inc_block(smc->real_sm, b);
+	int r2 = ca_inc_block(&smc->counts, b);
+	BUG_ON(r != r2);
+	return r;
+}
+
+static int sm_checker_dec_block(struct dm_space_map *sm, dm_block_t b)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_dec_block(smc->real_sm, b);
+	int r2 = ca_dec_block(&smc->counts, b);
+	BUG_ON(r != r2);
+	return r;
+}
+
+static int sm_checker_get_count(struct dm_space_map *sm, dm_block_t b, uint32_t *result)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	uint32_t result2 = 0;
+	int r = dm_sm_get_count(smc->real_sm, b, result);
+	int r2 = ca_get_count(&smc->counts, b, &result2);
+
+	BUG_ON(r != r2);
+	if (!r)
+		BUG_ON(*result != result2);
+	return r;
+}
+
+static int sm_checker_count_more_than_one(struct dm_space_map *sm, dm_block_t b, int *result)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int result2 = 0;
+	int r = dm_sm_count_is_more_than_one(smc->real_sm, b, result);
+	int r2 = ca_count_more_than_one(&smc->counts, b, &result2);
+
+	BUG_ON(r != r2);
+	if (!r)
+		BUG_ON(!(*result) && result2);
+	return r;
+}
+
+static int sm_checker_set_count(struct dm_space_map *sm, dm_block_t b, uint32_t count)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	uint32_t old_rc;
+	int r = dm_sm_set_count(smc->real_sm, b, count);
+	int r2;
+
+	BUG_ON(b >= smc->counts.nr);
+	old_rc = smc->counts.counts[b];
+	r2 = ca_set_count(&smc->counts, b, count);
+	BUG_ON(r != r2);
+
+	return r;
+}
+
+static int sm_checker_commit(struct dm_space_map *sm)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r;
+
+	r = dm_sm_commit(smc->real_sm);
+	if (r)
+		return r;
+
+	r = ca_commit(&smc->old_counts, &smc->counts);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+static int sm_checker_extend(struct dm_space_map *sm, dm_block_t extra_blocks)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	int r = dm_sm_extend(smc->real_sm, extra_blocks);
+	if (r)
+		return r;
+
+	return ca_extend(&smc->counts, extra_blocks);
+}
+
+static int sm_checker_root_size(struct dm_space_map *sm, size_t *result)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	return dm_sm_root_size(smc->real_sm, result);
+}
+
+static int sm_checker_copy_root(struct dm_space_map *sm, void *copy_to_here_le, size_t len)
+{
+	struct sm_checker *smc = container_of(sm, struct sm_checker, sm);
+	return dm_sm_copy_root(smc->real_sm, copy_to_here_le, len);
+}
+
+/*----------------------------------------------------------------*/
+
+static struct dm_space_map ops_ = {
+	.destroy = sm_checker_destroy,
+	.get_nr_blocks = sm_checker_get_nr_blocks,
+	.get_nr_free = sm_checker_get_nr_free,
+	.inc_block = sm_checker_inc_block,
+	.dec_block = sm_checker_dec_block,
+	.new_block = sm_checker_new_block,
+	.get_count = sm_checker_get_count,
+	.count_is_more_than_one = sm_checker_count_more_than_one,
+	.set_count = sm_checker_set_count,
+	.commit = sm_checker_commit,
+	.extend = sm_checker_extend,
+	.root_size = sm_checker_root_size,
+	.copy_root = sm_checker_copy_root
+};
+
+struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm)
+{
+	int r;
+	struct sm_checker *smc;
+
+	if (!sm)
+		return NULL;
+
+	smc = kmalloc(sizeof(*smc), GFP_KERNEL);
+	if (!smc)
+		return NULL;
+
+	memcpy(&smc->sm, &ops_, sizeof(smc->sm));
+	r = ca_create(&smc->old_counts, sm);
+	if (r) {
+		kfree(smc);
+		return NULL;
+	}
+
+	r = ca_create(&smc->counts, sm);
+	if (r) {
+		ca_destroy(&smc->old_counts);
+		kfree(smc);
+		return NULL;
+	}
+
+	smc->real_sm = sm;
+
+	r = ca_load(&smc->counts, sm);
+	if (r) {
+		ca_destroy(&smc->counts);
+		ca_destroy(&smc->old_counts);
+		kfree(smc);
+		return NULL;
+	}
+
+	r = ca_commit(&smc->old_counts, &smc->counts);
+	if (r) {
+		ca_destroy(&smc->counts);
+		ca_destroy(&smc->old_counts);
+		kfree(smc);
+		return NULL;
+	}
+
+	return &smc->sm;
+}
+EXPORT_SYMBOL_GPL(dm_sm_checker_create);
+
+struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm)
+{
+	int r;
+	struct sm_checker *smc;
+
+	if (!sm)
+		return NULL;
+
+	smc = kmalloc(sizeof(*smc), GFP_KERNEL);
+	if (!smc)
+		return NULL;
+
+	memcpy(&smc->sm, &ops_, sizeof(smc->sm));
+	r = ca_create(&smc->old_counts, sm);
+	if (r) {
+		kfree(smc);
+		return NULL;
+	}
+
+	r = ca_create(&smc->counts, sm);
+	if (r) {
+		ca_destroy(&smc->old_counts);
+		kfree(smc);
+		return NULL;
+	}
+
+	smc->real_sm = sm;
+	return &smc->sm;
+}
+EXPORT_SYMBOL_GPL(dm_sm_checker_create_fresh);
+
+/*----------------------------------------------------------------*/
+
+#else
+
+struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm)
+{
+	return sm;
+}
+EXPORT_SYMBOL_GPL(dm_sm_checker_create);
+
+struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm)
+{
+	return sm;
+}
+EXPORT_SYMBOL_GPL(dm_sm_checker_create_fresh);
+
+/*----------------------------------------------------------------*/
+
+#endif
--- a/drivers/md/persistent-data/dm-space-map-checker.h
+++ b/drivers/md/persistent-data/dm-space-map-checker.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef SNAPSHOTS_SPACE_MAP_CHECKER_H
+#define SNAPSHOTS_SPACE_MAP_CHECKER_H
+
+#include "dm-space-map.h"
+
+/*----------------------------------------------------------------*/
+
+/*
+ * This space map wraps a real on-disk space map, and verifies all of its
+ * operations.  It uses a lot of memory, so only use if you have a specific
+ * problem that you're debugging.
+ *
+ * Ownership of @sm passes.
+ */
+struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm);
+struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm);
+
+/*----------------------------------------------------------------*/
+
+#endif
--- a/drivers/md/persistent-data/dm-space-map-common.c
+++ b/drivers/md/persistent-data/dm-space-map-common.c
--- a/drivers/md/persistent-data/dm-space-map-common.h
+++ b/drivers/md/persistent-data/dm-space-map-common.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef DM_SPACE_MAP_COMMON_H
+#define DM_SPACE_MAP_COMMON_H
+
+#include "dm-btree.h"
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Low level disk format
+ *
+ * Bitmap btree
+ * ------------
+ *
+ * Each value stored in the btree is an index_entry.  This points to a
+ * block that is used as a bitmap.  Within the bitmap hold 2 bits per
+ * entry, which represent UNUSED = 0, REF_COUNT = 1, REF_COUNT = 2 and
+ * REF_COUNT = many.
+ *
+ * Refcount btree
+ * --------------
+ *
+ * Any entry that has a ref count higher than 2 gets entered in the ref
+ * count tree.  The leaf values for this tree is the 32-bit ref count.
+ */
+
+struct disk_index_entry {
+	__le64 blocknr;
+	__le32 nr_free;
+	__le32 none_free_before;
+} __packed;
+
+
+#define MAX_METADATA_BITMAPS 255
+struct disk_metadata_index {
+	__le32 csum;
+	__le32 padding;
+	__le64 blocknr;
+
+	struct disk_index_entry index[MAX_METADATA_BITMAPS];
+} __packed;
+
+struct ll_disk;
+
+typedef int (*load_ie_fn)(struct ll_disk *ll, dm_block_t index, struct disk_index_entry *result);
+typedef int (*save_ie_fn)(struct ll_disk *ll, dm_block_t index, struct disk_index_entry *ie);
+typedef int (*init_index_fn)(struct ll_disk *ll);
+typedef int (*open_index_fn)(struct ll_disk *ll);
+typedef dm_block_t (*max_index_entries_fn)(struct ll_disk *ll);
+typedef int (*commit_fn)(struct ll_disk *ll);
+
+struct ll_disk {
+	struct dm_transaction_manager *tm;
+	struct dm_btree_info bitmap_info;
+	struct dm_btree_info ref_count_info;
+
+	uint32_t block_size;
+	uint32_t entries_per_block;
+	dm_block_t nr_blocks;
+	dm_block_t nr_allocated;
+
+	/*
+	 * bitmap_root may be a btree root or a simple index.
+	 */
+	dm_block_t bitmap_root;
+
+	dm_block_t ref_count_root;
+
+	struct disk_metadata_index mi_le;
+	load_ie_fn load_ie;
+	save_ie_fn save_ie;
+	init_index_fn init_index;
+	open_index_fn open_index;
+	max_index_entries_fn max_entries;
+	commit_fn commit;
+};
+
+struct disk_sm_root {
+	__le64 nr_blocks;
+	__le64 nr_allocated;
+	__le64 bitmap_root;
+	__le64 ref_count_root;
+} __packed;
+
+#define ENTRIES_PER_BYTE 4
+
+struct disk_bitmap_header {
+	__le32 csum;
+	__le32 not_used;
+	__le64 blocknr;
+} __packed;
+
+enum allocation_event {
+	SM_NONE,
+	SM_ALLOC,
+	SM_FREE,
+};
+
+/*----------------------------------------------------------------*/
+
+int sm_ll_extend(struct ll_disk *ll, dm_block_t extra_blocks);
+int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
+int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
+int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
+			  dm_block_t end, dm_block_t *result);
+int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
+int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
+int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
+int sm_ll_commit(struct ll_disk *ll);
+
+int sm_ll_new_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm);
+int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm,
+			void *root_le, size_t len);
+
+int sm_ll_new_disk(struct ll_disk *ll, struct dm_transaction_manager *tm);
+int sm_ll_open_disk(struct ll_disk *ll, struct dm_transaction_manager *tm,
+		    void *root_le, size_t len);
+
+/*----------------------------------------------------------------*/
+
+#endif	/* DM_SPACE_MAP_COMMON_H */
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#include "dm-space-map-checker.h"
+#include "dm-space-map-common.h"
+#include "dm-space-map-disk.h"
+#include "dm-space-map.h"
+#include "dm-transaction-manager.h"
+
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/device-mapper.h>
+
+#define DM_MSG_PREFIX "space map disk"
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Space map interface.
+ */
+struct sm_disk {
+	struct dm_space_map sm;
+
+	struct ll_disk ll;
+	struct ll_disk old_ll;
+
+	dm_block_t begin;
+	dm_block_t nr_allocated_this_transaction;
+};
+
+static void sm_disk_destroy(struct dm_space_map *sm)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	kfree(smd);
+}
+
+static int sm_disk_extend(struct dm_space_map *sm, dm_block_t extra_blocks)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	return sm_ll_extend(&smd->ll, extra_blocks);
+}
+
+static int sm_disk_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+	*count = smd->old_ll.nr_blocks;
+
+	return 0;
+}
+
+static int sm_disk_get_nr_free(struct dm_space_map *sm, dm_block_t *count)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+	*count = (smd->old_ll.nr_blocks - smd->old_ll.nr_allocated) - smd->nr_allocated_this_transaction;
+
+	return 0;
+}
+
+static int sm_disk_get_count(struct dm_space_map *sm, dm_block_t b,
+			     uint32_t *result)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+	return sm_ll_lookup(&smd->ll, b, result);
+}
+
+static int sm_disk_count_is_more_than_one(struct dm_space_map *sm, dm_block_t b,
+					  int *result)
+{
+	int r;
+	uint32_t count;
+
+	r = sm_disk_get_count(sm, b, &count);
+	if (r)
+		return r;
+
+	return count > 1;
+}
+
+static int sm_disk_set_count(struct dm_space_map *sm, dm_block_t b,
+			     uint32_t count)
+{
+	int r;
+	uint32_t old_count;
+	enum allocation_event ev;
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	r = sm_ll_insert(&smd->ll, b, count, &ev);
+	if (!r) {
+		switch (ev) {
+		case SM_NONE:
+			break;
+
+		case SM_ALLOC:
+			/*
+			 * This _must_ be free in the prior transaction
+			 * otherwise we've lost atomicity.
+			 */
+			smd->nr_allocated_this_transaction++;
+			break;
+
+		case SM_FREE:
+			/*
+			 * It's only free if it's also free in the last
+			 * transaction.
+			 */
+			r = sm_ll_lookup(&smd->old_ll, b, &old_count);
+			if (r)
+				return r;
+
+			if (!old_count)
+				smd->nr_allocated_this_transaction--;
+			break;
+		}
+	}
+
+	return r;
+}
+
+static int sm_disk_inc_block(struct dm_space_map *sm, dm_block_t b)
+{
+	int r;
+	enum allocation_event ev;
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	r = sm_ll_inc(&smd->ll, b, &ev);
+	if (!r && (ev == SM_ALLOC))
+		/*
+		 * This _must_ be free in the prior transaction
+		 * otherwise we've lost atomicity.
+		 */
+		smd->nr_allocated_this_transaction++;
+
+	return r;
+}
+
+static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
+{
+	int r;
+	uint32_t old_count;
+	enum allocation_event ev;
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	r = sm_ll_dec(&smd->ll, b, &ev);
+	if (!r && (ev == SM_FREE)) {
+		/*
+		 * It's only free if it's also free in the last
+		 * transaction.
+		 */
+		r = sm_ll_lookup(&smd->old_ll, b, &old_count);
+		if (r)
+			return r;
+
+		if (!old_count)
+			smd->nr_allocated_this_transaction--;
+	}
+
+	return r;
+}
+
+static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
+{
+	int r;
+	enum allocation_event ev;
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	/* FIXME: we should loop round a couple of times */
+	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
+	if (r)
+		return r;
+
+	smd->begin = *b + 1;
+	r = sm_ll_inc(&smd->ll, *b, &ev);
+	if (!r) {
+		BUG_ON(ev != SM_ALLOC);
+		smd->nr_allocated_this_transaction++;
+	}
+
+	return r;
+}
+
+static int sm_disk_commit(struct dm_space_map *sm)
+{
+	int r;
+	dm_block_t nr_free;
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+
+	r = sm_disk_get_nr_free(sm, &nr_free);
+	if (r)
+		return r;
+
+	r = sm_ll_commit(&smd->ll);
+	if (r)
+		return r;
+
+	memcpy(&smd->old_ll, &smd->ll, sizeof(smd->old_ll));
+	smd->begin = 0;
+	smd->nr_allocated_this_transaction = 0;
+
+	r = sm_disk_get_nr_free(sm, &nr_free);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+static int sm_disk_root_size(struct dm_space_map *sm, size_t *result)
+{
+	*result = sizeof(struct disk_sm_root);
+
+	return 0;
+}
+
+static int sm_disk_copy_root(struct dm_space_map *sm, void *where_le, size_t max)
+{
+	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
+	struct disk_sm_root root_le;
+
+	root_le.nr_blocks = cpu_to_le64(smd->ll.nr_blocks);
+	root_le.nr_allocated = cpu_to_le64(smd->ll.nr_allocated);
+	root_le.bitmap_root = cpu_to_le64(smd->ll.bitmap_root);
+	root_le.ref_count_root = cpu_to_le64(smd->ll.ref_count_root);
+
+	if (max < sizeof(root_le))
+		return -ENOSPC;
+
+	memcpy(where_le, &root_le, sizeof(root_le));
+
+	return 0;
+}
+
+/*----------------------------------------------------------------*/
+
+static struct dm_space_map ops = {
+	.destroy = sm_disk_destroy,
+	.extend = sm_disk_extend,
+	.get_nr_blocks = sm_disk_get_nr_blocks,
+	.get_nr_free = sm_disk_get_nr_free,
+	.get_count = sm_disk_get_count,
+	.count_is_more_than_one = sm_disk_count_is_more_than_one,
+	.set_count = sm_disk_set_count,
+	.inc_block = sm_disk_inc_block,
+	.dec_block = sm_disk_dec_block,
+	.new_block = sm_disk_new_block,
+	.commit = sm_disk_commit,
+	.root_size = sm_disk_root_size,
+	.copy_root = sm_disk_copy_root
+};
+
+static struct dm_space_map *dm_sm_disk_create_real(
+	struct dm_transaction_manager *tm,
+	dm_block_t nr_blocks)
+{
+	int r;
+	struct sm_disk *smd;
+
+	smd = kmalloc(sizeof(*smd), GFP_KERNEL);
+	if (!smd)
+		return ERR_PTR(-ENOMEM);
+
+	smd->begin = 0;
+	smd->nr_allocated_this_transaction = 0;
+	memcpy(&smd->sm, &ops, sizeof(smd->sm));
+
+	r = sm_ll_new_disk(&smd->ll, tm);
+	if (r)
+		goto bad;
+
+	r = sm_ll_extend(&smd->ll, nr_blocks);
+	if (r)
+		goto bad;
+
+	r = sm_disk_commit(&smd->sm);
+	if (r)
+		goto bad;
+
+	return &smd->sm;
+
+bad:
+	kfree(smd);
+	return ERR_PTR(r);
+}
+
+struct dm_space_map *dm_sm_disk_create(struct dm_transaction_manager *tm,
+				       dm_block_t nr_blocks)
+{
+	struct dm_space_map *sm = dm_sm_disk_create_real(tm, nr_blocks);
+	return dm_sm_checker_create_fresh(sm);
+}
+EXPORT_SYMBOL_GPL(dm_sm_disk_create);
+
+static struct dm_space_map *dm_sm_disk_open_real(
+	struct dm_transaction_manager *tm,
+	void *root_le, size_t len)
+{
+	int r;
+	struct sm_disk *smd;
+
+	smd = kmalloc(sizeof(*smd), GFP_KERNEL);
+	if (!smd)
+		return ERR_PTR(-ENOMEM);
+
+	smd->begin = 0;
+	smd->nr_allocated_this_transaction = 0;
+	memcpy(&smd->sm, &ops, sizeof(smd->sm));
+
+	r = sm_ll_open_disk(&smd->ll, tm, root_le, len);
+	if (r)
+		goto bad;
+
+	r = sm_disk_commit(&smd->sm);
+	if (r)
+		goto bad;
+
+	return &smd->sm;
+
+bad:
+	kfree(smd);
+	return ERR_PTR(r);
+}
+
+struct dm_space_map *dm_sm_disk_open(struct dm_transaction_manager *tm,
+				     void *root_le, size_t len)
+{
+	return dm_sm_checker_create(
+		dm_sm_disk_open_real(tm, root_le, len));
+}
+EXPORT_SYMBOL_GPL(dm_sm_disk_open);
+
+/*----------------------------------------------------------------*/
--- a/drivers/md/persistent-data/dm-space-map-disk.h
+++ b/drivers/md/persistent-data/dm-space-map-disk.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef _LINUX_DM_SPACE_MAP_DISK_H
+#define _LINUX_DM_SPACE_MAP_DISK_H
+
+#include "dm-block-manager.h"
+
+struct dm_space_map;
+struct dm_transaction_manager;
+
+/*
+ * Unfortunately we have to use two-phase construction due to the cycle
+ * between the tm and sm.
+ */
+struct dm_space_map *dm_sm_disk_create(struct dm_transaction_manager *tm,
+				       dm_block_t nr_blocks);
+
+struct dm_space_map *dm_sm_disk_open(struct dm_transaction_manager *tm,
+				     void *root, size_t len);
+
+#endif /* _LINUX_DM_SPACE_MAP_DISK_H */
--- a/drivers/md/persistent-data/dm-space-map-metadata.c
+++ b/drivers/md/persistent-data/dm-space-map-metadata.c
--- a/drivers/md/persistent-data/dm-space-map-metadata.h
+++ b/drivers/md/persistent-data/dm-space-map-metadata.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef DM_SPACE_MAP_METADATA_H
+#define DM_SPACE_MAP_METADATA_H
+
+#include "dm-transaction-manager.h"
+
+/*
+ * Unfortunately we have to use two-phase construction due to the cycle
+ * between the tm and sm.
+ */
+struct dm_space_map *dm_sm_metadata_init(void);
+
+/*
+ * Create a fresh space map.
+ */
+int dm_sm_metadata_create(struct dm_space_map *sm,
+			  struct dm_transaction_manager *tm,
+			  dm_block_t nr_blocks,
+			  dm_block_t superblock);
+
+/*
+ * Open from a previously-recorded root.
+ */
+int dm_sm_metadata_open(struct dm_space_map *sm,
+			struct dm_transaction_manager *tm,
+			void *root_le, size_t len);
+
+#endif	/* DM_SPACE_MAP_METADATA_H */
--- a/drivers/md/persistent-data/dm-space-map.h
+++ b/drivers/md/persistent-data/dm-space-map.h
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef _LINUX_DM_SPACE_MAP_H
+#define _LINUX_DM_SPACE_MAP_H
+
+#include "dm-block-manager.h"
+
+/*
+ * struct dm_space_map keeps a record of how many times each block in a device
+ * is referenced.  It needs to be fixed on disk as part of the transaction.
+ */
+struct dm_space_map {
+	void (*destroy)(struct dm_space_map *sm);
+
+	/*
+	 * You must commit before allocating the newly added space.
+	 */
+	int (*extend)(struct dm_space_map *sm, dm_block_t extra_blocks);
+
+	/*
+	 * Extensions do not appear in this count until after commit has
+	 * been called.
+	 */
+	int (*get_nr_blocks)(struct dm_space_map *sm, dm_block_t *count);
+
+	/*
+	 * Space maps must never allocate a block from the previous
+	 * transaction, in case we need to rollback.  This complicates the
+	 * semantics of get_nr_free(), it should return the number of blocks
+	 * that are available for allocation _now_.  For instance you may
+	 * have blocks with a zero reference count that will not be
+	 * available for allocation until after the next commit.
+	 */
+	int (*get_nr_free)(struct dm_space_map *sm, dm_block_t *count);
+
+	int (*get_count)(struct dm_space_map *sm, dm_block_t b, uint32_t *result);
+	int (*count_is_more_than_one)(struct dm_space_map *sm, dm_block_t b,
+				      int *result);
+	int (*set_count)(struct dm_space_map *sm, dm_block_t b, uint32_t count);
+
+	int (*commit)(struct dm_space_map *sm);
+
+	int (*inc_block)(struct dm_space_map *sm, dm_block_t b);
+	int (*dec_block)(struct dm_space_map *sm, dm_block_t b);
+
+	/*
+	 * new_block will increment the returned block.
+	 */
+	int (*new_block)(struct dm_space_map *sm, dm_block_t *b);
+
+	/*
+	 * The root contains all the information needed to fix the space map.
+	 * Generally this info is small, so squirrel it away in a disk block
+	 * along with other info.
+	 */
+	int (*root_size)(struct dm_space_map *sm, size_t *result);
+	int (*copy_root)(struct dm_space_map *sm, void *copy_to_here_le, size_t len);
+};
+
+/*----------------------------------------------------------------*/
+
+static inline void dm_sm_destroy(struct dm_space_map *sm)
+{
+	sm->destroy(sm);
+}
+
+static inline int dm_sm_extend(struct dm_space_map *sm, dm_block_t extra_blocks)
+{
+	return sm->extend(sm, extra_blocks);
+}
+
+static inline int dm_sm_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count)
+{
+	return sm->get_nr_blocks(sm, count);
+}
+
+static inline int dm_sm_get_nr_free(struct dm_space_map *sm, dm_block_t *count)
+{
+	return sm->get_nr_free(sm, count);
+}
+
+static inline int dm_sm_get_count(struct dm_space_map *sm, dm_block_t b,
+				  uint32_t *result)
+{
+	return sm->get_count(sm, b, result);
+}
+
+static inline int dm_sm_count_is_more_than_one(struct dm_space_map *sm,
+					       dm_block_t b, int *result)
+{
+	return sm->count_is_more_than_one(sm, b, result);
+}
+
+static inline int dm_sm_set_count(struct dm_space_map *sm, dm_block_t b,
+				  uint32_t count)
+{
+	return sm->set_count(sm, b, count);
+}
+
+static inline int dm_sm_commit(struct dm_space_map *sm)
+{
+	return sm->commit(sm);
+}
+
+static inline int dm_sm_inc_block(struct dm_space_map *sm, dm_block_t b)
+{
+	return sm->inc_block(sm, b);
+}
+
+static inline int dm_sm_dec_block(struct dm_space_map *sm, dm_block_t b)
+{
+	return sm->dec_block(sm, b);
+}
+
+static inline int dm_sm_new_block(struct dm_space_map *sm, dm_block_t *b)
+{
+	return sm->new_block(sm, b);
+}
+
+static inline int dm_sm_root_size(struct dm_space_map *sm, size_t *result)
+{
+	return sm->root_size(sm, result);
+}
+
+static inline int dm_sm_copy_root(struct dm_space_map *sm, void *copy_to_here_le, size_t len)
+{
+	return sm->copy_root(sm, copy_to_here_le, len);
+}
+
+#endif	/* _LINUX_DM_SPACE_MAP_H */
--- a/drivers/md/persistent-data/dm-transaction-manager.c
+++ b/drivers/md/persistent-data/dm-transaction-manager.c
+/*
+ * Copyright (C) 2011 Red Hat, Inc.
+ *
+ * This file is released under the GPL.
+ */
+#include "dm-transaction-manager.h"
+#include "dm-space-map.h"
+#include "dm-space-map-checker.h"
+#include "dm-space-map-disk.h"
+#include "dm-space-map-metadata.h"
+#include "dm-persistent-data-internal.h"
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/device-mapper.h>
+
+#define DM_MSG_PREFIX "transaction manager"
+
+/*----------------------------------------------------------------*/
+
+struct shadow_info {
+	struct hlist_node hlist;
+	dm_block_t where;
+};
+
+/*
+ * It would be nice if we scaled with the size of transaction.
+ */
+#define HASH_SIZE 256
+#define HASH_MASK (HASH_SIZE - 1)
+
+struct dm_transaction_manager {
+	int is_clone;
+	struct dm_transaction_manager *real;
+
+	struct dm_block_manager *bm;
+	struct dm_space_map *sm;
+
+	spinlock_t lock;
+	struct hlist_head buckets[HASH_SIZE];
+};
+
+/*----------------------------------------------------------------*/
+
+static int is_shadow(struct dm_transaction_manager *tm, dm_block_t b)
+{
+	int r = 0;
+	unsigned bucket = dm_hash_block(b, HASH_MASK);
+	struct shadow_info *si;
+	struct hlist_node *n;
+
+	spin_lock(&tm->lock);
+	hlist_for_each_entry(si, n, tm->buckets + bucket, hlist)
+		if (si->where == b) {
+			r = 1;
+			break;
+		}
+	spin_unlock(&tm->lock);
+
+	return r;
+}
+
+/*
+ * This can silently fail if there's no memory.  We're ok with this since
+ * creating redundant shadows causes no harm.
+ */
+static void insert_shadow(struct dm_transaction_manager *tm, dm_block_t b)
+{
+	unsigned bucket;
+	struct shadow_info *si;
+
+	si = kmalloc(sizeof(*si), GFP_NOIO);
+	if (si) {
+		si->where = b;
+		bucket = dm_hash_block(b, HASH_MASK);
+		spin_lock(&tm->lock);
+		hlist_add_head(&si->hlist, tm->buckets + bucket);
+		spin_unlock(&tm->lock);
+	}
+}
+
+static void wipe_shadow_table(struct dm_transaction_manager *tm)
+{
+	struct shadow_info *si;
+	struct hlist_node *n, *tmp;
+	struct hlist_head *bucket;
+	int i;
+
+	spin_lock(&tm->lock);
+	for (i = 0; i < HASH_SIZE; i++) {
+		bucket = tm->buckets + i;
+		hlist_for_each_entry_safe(si, n, tmp, bucket, hlist)
+			kfree(si);
+
+		INIT_HLIST_HEAD(bucket);
+	}
+
+	spin_unlock(&tm->lock);
+}
+
+/*----------------------------------------------------------------*/
+
+static struct dm_transaction_manager *dm_tm_create(struct dm_block_manager *bm,
+						   struct dm_space_map *sm)
+{
+	int i;
+	struct dm_transaction_manager *tm;
+
+	tm = kmalloc(sizeof(*tm), GFP_KERNEL);
+	if (!tm)
+		return ERR_PTR(-ENOMEM);
+
+	tm->is_clone = 0;
+	tm->real = NULL;
+	tm->bm = bm;
+	tm->sm = sm;
+
+	spin_lock_init(&tm->lock);
+	for (i = 0; i < HASH_SIZE; i++)
+		INIT_HLIST_HEAD(tm->buckets + i);
+
+	return tm;
+}
+
+struct dm_transaction_manager *dm_tm_create_non_blocking_clone(struct dm_transaction_manager *real)
+{
+	struct dm_transaction_manager *tm;
+
+	tm = kmalloc(sizeof(*tm), GFP_KERNEL);
+	if (tm) {
+		tm->is_clone = 1;
+		tm->real = real;
+	}
+
+	return tm;
+}
+EXPORT_SYMBOL_GPL(dm_tm_create_non_blocking_clone);
+
+void dm_tm_destroy(struct dm_transaction_manager *tm)
+{
+	kfree(tm);
+}
+EXPORT_SYMBOL_GPL(dm_tm_destroy);
+
+int dm_tm_pre_commit(struct dm_transaction_manager *tm)
+{
+	int r;
+
+	if (tm->is_clone)
+		return -EWOULDBLOCK;
+
+	r = dm_sm_commit(tm->sm);
+	if (r < 0)
+		return r;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_tm_pre_commit);
+
+int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *root)
+{
+	if (tm->is_clone)
+		return -EWOULDBLOCK;
+
+	wipe_shadow_table(tm);
+
+	return dm_bm_flush_and_unlock(tm->bm, root);
+}
+EXPORT_SYMBOL_GPL(dm_tm_commit);
+
+int dm_tm_new_block(struct dm_transaction_manager *tm,
+		    struct dm_block_validator *v,
+		    struct dm_block **result)
+{
+	int r;
+	dm_block_t new_block;
+
+	if (tm->is_clone)
+		return -EWOULDBLOCK;
+
+	r = dm_sm_new_block(tm->sm, &new_block);
+	if (r < 0)
+		return r;
+
+	r = dm_bm_write_lock_zero(tm->bm, new_block, v, result);
+	if (r < 0) {
+		dm_sm_dec_block(tm->sm, new_block);
+		return r;
+	}
+
+	/*
+	 * New blocks count as shadows in that they don't need to be
+	 * shadowed again.
+	 */
+	insert_shadow(tm, new_block);
+
+	return 0;
+}
+
+static int __shadow_block(struct dm_transaction_manager *tm, dm_block_t orig,
+			  struct dm_block_validator *v,
+			  struct dm_block **result)
+{
+	int r;
+	dm_block_t new;
+	struct dm_block *orig_block;
+
+	r = dm_sm_new_block(tm->sm, &new);
+	if (r < 0)
+		return r;
+
+	r = dm_sm_dec_block(tm->sm, orig);
+	if (r < 0)
+		return r;
+
+	r = dm_bm_read_lock(tm->bm, orig, v, &orig_block);
+	if (r < 0)
+		return r;
+
+	r = dm_bm_unlock_move(orig_block, new);
+	if (r < 0) {
+		dm_bm_unlock(orig_block);
+		return r;
+	}
+
+	return dm_bm_write_lock(tm->bm, new, v, result);
+}
+
+int dm_tm_shadow_block(struct dm_transaction_manager *tm, dm_block_t orig,
+		       struct dm_block_validator *v, struct dm_block **result,
+		       int *inc_children)
+{
+	int r;
+
+	if (tm->is_clone)
+		return -EWOULDBLOCK;
+
+	r = dm_sm_count_is_more_than_one(tm->sm, orig, inc_children);
+	if (r < 0)
+		return r;
+
+	if (is_shadow(tm, orig) && !*inc_children)
+		return dm_bm_write_lock(tm->bm, orig, v, result);
+
+	r = __shadow_block(tm, orig, v, result);
+	if (r < 0)
+		return r;
+	insert_shadow(tm, dm_block_location(*result));
+
+	return r;
+}
+
+int dm_tm_read_lock(struct dm_transaction_manager *tm, dm_block_t b,
+		    struct dm_block_validator *v,
+		    struct dm_block **blk)
+{
+	if (tm->is_clone)
+		return dm_bm_read_try_lock(tm->real->bm, b, v, blk);
+
+	return dm_bm_read_lock(tm->bm, b, v, blk);
+}
+
+int dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b)
+{
+	return dm_bm_unlock(b);
+}
+EXPORT_SYMBOL_GPL(dm_tm_unlock);
+
+void dm_tm_inc(struct dm_transaction_manager *tm, dm_block_t b)
+{
+	/*
+	 * The non-blocking clone doesn't support this.
+	 */
+	BUG_ON(tm->is_clone);
+
+	dm_sm_inc_block(tm->sm, b);
+}
+EXPORT_SYMBOL_GPL(dm_tm_inc);
+
+void dm_tm_dec(struct dm_transaction_manager *tm, dm_block_t b)
+{
+	/*
+	 * The non-blocking clone doesn't support this.
+	 */
+	BUG_ON(tm->is_clone);
+
+	dm_sm_dec_block(tm->sm, b);
+}
+EXPORT_SYMBOL_GPL(dm_tm_dec);
+
+int dm_tm_ref(struct dm_transaction_manager *tm, dm_block_t b,
+	      uint32_t *result)
+{
+	if (tm->is_clone)
+		return -EWOULDBLOCK;
+
+	return dm_sm_get_count(tm->sm, b, result);
+}
+
+struct dm_block_manager *dm_tm_get_bm(struct dm_transaction_manager *tm)
+{
+	return tm->bm;
+}
+
+/*----------------------------------------------------------------*/
+
+static int dm_tm_create_internal(struct dm_block_manager *bm,
+				 dm_block_t sb_location,
+				 struct dm_block_validator *sb_validator,
+				 size_t root_offset, size_t root_max_len,
+				 struct dm_transaction_manager **tm,
+				 struct dm_space_map **sm,
+				 struct dm_block **sblock,
+				 int create)
+{
+	int r;
+	struct dm_space_map *inner;
+
+	inner = dm_sm_metadata_init();
+	if (IS_ERR(inner))
+		return PTR_ERR(inner);
+
+	*tm = dm_tm_create(bm, inner);
+	if (IS_ERR(*tm)) {
+		dm_sm_destroy(inner);
+		return PTR_ERR(*tm);
+	}
+
+	if (create) {
+		r = dm_bm_write_lock_zero(dm_tm_get_bm(*tm), sb_location,
+					  sb_validator, sblock);
+		if (r < 0) {
+			DMERR("couldn't lock superblock");
+			goto bad1;
+		}
+
+		r = dm_sm_metadata_create(inner, *tm, dm_bm_nr_blocks(bm),
+					  sb_location);
+		if (r) {
+			DMERR("couldn't create metadata space map");
+			goto bad2;
+		}
+
+		*sm = dm_sm_checker_create(inner);
+		if (!*sm)
+			goto bad2;
+
+	} else {
+		r = dm_bm_write_lock(dm_tm_get_bm(*tm), sb_location,
+				     sb_validator, sblock);
+		if (r < 0) {
+			DMERR("couldn't lock superblock");
+			goto bad1;
+		}
+
+		r = dm_sm_metadata_open(inner, *tm,
+					dm_block_data(*sblock) + root_offset,
+					root_max_len);
+		if (r) {
+			DMERR("couldn't open metadata space map");
+			goto bad2;
+		}
+
+		*sm = dm_sm_checker_create(inner);
+		if (!*sm)
+			goto bad2;
+	}
+
+	return 0;
+
+bad2:
+	dm_tm_unlock(*tm, *sblock);
+bad1:
+	dm_tm_destroy(*tm);
+	dm_sm_destroy(inner);
+	return r;
+}
+
+int dm_tm_create_with_sm(struct dm_block_manager *bm, dm_block_t sb_location,
+			 struct dm_block_validator *sb_validator,
+			 struct dm_transaction_manager **tm,
+			 struct dm_space_map **sm, struct dm_block **sblock)
+{
+	return dm_tm_create_internal(bm, sb_location, sb_validator,
+				     0, 0, tm, sm, sblock, 1);
+}
+EXPORT_SYMBOL_GPL(dm_tm_create_with_sm);
+
+int dm_tm_open_with_sm(struct dm_block_manager *bm, dm_block_t sb_location,
+		       struct dm_block_validator *sb_validator,
+		       size_t root_offset, size_t root_max_len,
+		       struct dm_transaction_manager **tm,
+		       struct dm_space_map **sm, struct dm_block **sblock)
+{
+	return dm_tm_create_internal(bm, sb_location, sb_validator, root_offset,
+				     root_max_len, tm, sm, sblock, 0);
+}
+EXPORT_SYMBOL_GPL(dm_tm_open_with_sm);
+
+/*----------------------------------------------------------------*/
--- a/drivers/md/persistent-data/dm-transaction-manager.h
+++ b/drivers/md/persistent-data/dm-transaction-manager.h