Commit 5306892a authored by Eric Biggers's avatar Eric Biggers

fsverity: support verification with tree block size < PAGE_SIZE

Add support for verifying data from verity files whose Merkle tree block
size is less than the page size.  The main use case for this is to allow
a single Merkle tree block size to be used across all systems, so that
only one set of fsverity file digests and signatures is needed.

To do this, eliminate various assumptions that the Merkle tree block
size and the page size are the same:

- Make fsverity_verify_page() a wrapper around a new function
  fsverity_verify_blocks() which verifies one or more blocks in a page.

- When a Merkle tree block is needed, get the corresponding page and
  only verify and use the needed portion.  (The Merkle tree continues to
  be read and cached in page-sized chunks; that doesn't need to change.)

- When the Merkle tree block size and page size differ, use a bitmap
  fsverity_info::hash_block_verified to keep track of which Merkle tree
  blocks have been verified, as PageChecked cannot be used directly.
Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
Reviewed-by: default avatarAndrey Albershteyn <aalbersh@redhat.com>
Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-7-ebiggers@kernel.org
parent f45555bf
...@@ -572,47 +572,44 @@ For filesystems using Linux's pagecache, the ``->read_folio()`` and ...@@ -572,47 +572,44 @@ For filesystems using Linux's pagecache, the ``->read_folio()`` and
are marked Uptodate. Merely hooking ``->read_iter()`` would be are marked Uptodate. Merely hooking ``->read_iter()`` would be
insufficient, since ``->read_iter()`` is not used for memory maps. insufficient, since ``->read_iter()`` is not used for memory maps.
Therefore, fs/verity/ provides a function fsverity_verify_page() which Therefore, fs/verity/ provides the function fsverity_verify_blocks()
verifies a page that has been read into the pagecache of a verity which verifies data that has been read into the pagecache of a verity
inode, but is still locked and not Uptodate, so it's not yet readable inode. The containing page must still be locked and not Uptodate, so
by userspace. As needed to do the verification, it's not yet readable by userspace. As needed to do the verification,
fsverity_verify_page() will call back into the filesystem to read fsverity_verify_blocks() will call back into the filesystem to read
Merkle tree pages via fsverity_operations::read_merkle_tree_page(). hash blocks via fsverity_operations::read_merkle_tree_page().
fsverity_verify_page() returns false if verification failed; in this fsverity_verify_blocks() returns false if verification failed; in this
case, the filesystem must not set the page Uptodate. Following this, case, the filesystem must not set the page Uptodate. Following this,
as per the usual Linux pagecache behavior, attempts by userspace to as per the usual Linux pagecache behavior, attempts by userspace to
read() from the part of the file containing the page will fail with read() from the part of the file containing the page will fail with
EIO, and accesses to the page within a memory map will raise SIGBUS. EIO, and accesses to the page within a memory map will raise SIGBUS.
fsverity_verify_page() currently only supports the case where the In principle, verifying a data block requires verifying the entire
Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). path in the Merkle tree from the data block to the root hash.
However, for efficiency the filesystem may cache the hash blocks.
In principle, fsverity_verify_page() verifies the entire path in the Therefore, fsverity_verify_blocks() only ascends the tree reading hash
Merkle tree from the data page to the root hash. However, for blocks until an already-verified hash block is seen. It then verifies
efficiency the filesystem may cache the hash pages. Therefore, the path to that block.
fsverity_verify_page() only ascends the tree reading hash pages until
an already-verified hash page is seen, as indicated by the PageChecked
bit being set. It then verifies the path to that page.
This optimization, which is also used by dm-verity, results in This optimization, which is also used by dm-verity, results in
excellent sequential read performance. This is because usually (e.g. excellent sequential read performance. This is because usually (e.g.
127 in 128 times for 4K blocks and SHA-256) the hash page from the 127 in 128 times for 4K blocks and SHA-256) the hash block from the
bottom level of the tree will already be cached and checked from bottom level of the tree will already be cached and checked from
reading a previous data page. However, random reads perform worse. reading a previous data block. However, random reads perform worse.
Block device based filesystems Block device based filesystems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Block device based filesystems (e.g. ext4 and f2fs) in Linux also use Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
the pagecache, so the above subsection applies too. However, they the pagecache, so the above subsection applies too. However, they
also usually read many pages from a file at once, grouped into a also usually read many data blocks from a file at once, grouped into a
structure called a "bio". To make it easier for these types of structure called a "bio". To make it easier for these types of
filesystems to support fs-verity, fs/verity/ also provides a function filesystems to support fs-verity, fs/verity/ also provides a function
fsverity_verify_bio() which verifies all pages in a bio. fsverity_verify_bio() which verifies all data blocks in a bio.
ext4 and f2fs also support encryption. If a verity file is also ext4 and f2fs also support encryption. If a verity file is also
encrypted, the pages must be decrypted before being verified. To encrypted, the data must be decrypted before being verified. To
support this, these filesystems allocate a "post-read context" for support this, these filesystems allocate a "post-read context" for
each bio and store it in ``->bi_private``:: each bio and store it in ``->bi_private``::
...@@ -631,10 +628,10 @@ verification. Finally, pages where no decryption or verity error ...@@ -631,10 +628,10 @@ verification. Finally, pages where no decryption or verity error
occurred are marked Uptodate, and the pages are unlocked. occurred are marked Uptodate, and the pages are unlocked.
On many filesystems, files can contain holes. Normally, On many filesystems, files can contain holes. Normally,
``->readahead()`` simply zeroes holes and sets the corresponding pages ``->readahead()`` simply zeroes hole blocks and considers the
Uptodate; no bios are issued. To prevent this case from bypassing corresponding data to be up-to-date; no bios are issued. To prevent
fs-verity, these filesystems use fsverity_verify_page() to verify hole this case from bypassing fs-verity, filesystems use
pages. fsverity_verify_blocks() to verify hole blocks.
Filesystems also disable direct I/O on verity files, since otherwise Filesystems also disable direct I/O on verity files, since otherwise
direct I/O would bypass fs-verity. direct I/O would bypass fs-verity.
......
...@@ -42,9 +42,11 @@ struct merkle_tree_params { ...@@ -42,9 +42,11 @@ struct merkle_tree_params {
unsigned int digest_size; /* same as hash_alg->digest_size */ unsigned int digest_size; /* same as hash_alg->digest_size */
unsigned int block_size; /* size of data and tree blocks */ unsigned int block_size; /* size of data and tree blocks */
unsigned int hashes_per_block; /* number of hashes per tree block */ unsigned int hashes_per_block; /* number of hashes per tree block */
unsigned int blocks_per_page; /* PAGE_SIZE / block_size */
u8 log_digestsize; /* log2(digest_size) */ u8 log_digestsize; /* log2(digest_size) */
u8 log_blocksize; /* log2(block_size) */ u8 log_blocksize; /* log2(block_size) */
u8 log_arity; /* log2(hashes_per_block) */ u8 log_arity; /* log2(hashes_per_block) */
u8 log_blocks_per_page; /* log2(blocks_per_page) */
unsigned int num_levels; /* number of levels in Merkle tree */ unsigned int num_levels; /* number of levels in Merkle tree */
u64 tree_size; /* Merkle tree size in bytes */ u64 tree_size; /* Merkle tree size in bytes */
unsigned long tree_pages; /* Merkle tree size in pages */ unsigned long tree_pages; /* Merkle tree size in pages */
...@@ -70,9 +72,10 @@ struct fsverity_info { ...@@ -70,9 +72,10 @@ struct fsverity_info {
u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE]; u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE];
u8 file_digest[FS_VERITY_MAX_DIGEST_SIZE]; u8 file_digest[FS_VERITY_MAX_DIGEST_SIZE];
const struct inode *inode; const struct inode *inode;
unsigned long *hash_block_verified;
spinlock_t hash_page_init_lock;
}; };
#define FS_VERITY_MAX_SIGNATURE_SIZE (FS_VERITY_MAX_DESCRIPTOR_SIZE - \ #define FS_VERITY_MAX_SIGNATURE_SIZE (FS_VERITY_MAX_DESCRIPTOR_SIZE - \
sizeof(struct fsverity_descriptor)) sizeof(struct fsverity_descriptor))
......
...@@ -56,7 +56,23 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params, ...@@ -56,7 +56,23 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
goto out_err; goto out_err;
} }
if (log_blocksize != PAGE_SHIFT) { /*
* fs/verity/ directly assumes that the Merkle tree block size is a
* power of 2 less than or equal to PAGE_SIZE. Another restriction
* arises from the interaction between fs/verity/ and the filesystems
* themselves: filesystems expect to be able to verify a single
* filesystem block of data at a time. Therefore, the Merkle tree block
* size must also be less than or equal to the filesystem block size.
*
* The above are the only hard limitations, so in theory the Merkle tree
* block size could be as small as twice the digest size. However,
* that's not useful, and it would result in some unusually deep and
* large Merkle trees. So we currently require that the Merkle tree
* block size be at least 1024 bytes. That's small enough to test the
* sub-page block case on systems with 4K pages, but not too small.
*/
if (log_blocksize < 10 || log_blocksize > PAGE_SHIFT ||
log_blocksize > inode->i_blkbits) {
fsverity_warn(inode, "Unsupported log_blocksize: %u", fsverity_warn(inode, "Unsupported log_blocksize: %u",
log_blocksize); log_blocksize);
err = -EINVAL; err = -EINVAL;
...@@ -64,6 +80,8 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params, ...@@ -64,6 +80,8 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
} }
params->log_blocksize = log_blocksize; params->log_blocksize = log_blocksize;
params->block_size = 1 << log_blocksize; params->block_size = 1 << log_blocksize;
params->log_blocks_per_page = PAGE_SHIFT - log_blocksize;
params->blocks_per_page = 1 << params->log_blocks_per_page;
if (WARN_ON(!is_power_of_2(params->digest_size))) { if (WARN_ON(!is_power_of_2(params->digest_size))) {
err = -EINVAL; err = -EINVAL;
...@@ -108,11 +126,19 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params, ...@@ -108,11 +126,19 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
} }
/* /*
* Since the data, and thus also the Merkle tree, cannot have more than * With block_size != PAGE_SIZE, an in-memory bitmap will need to be
* ULONG_MAX pages, hash block indices can always fit in an * allocated to track the "verified" status of hash blocks. Don't allow
* 'unsigned long'. To be safe, explicitly check for it too. * this bitmap to get too large. For now, limit it to 1 MiB, which
* limits the file size to about 4.4 TB with SHA-256 and 4K blocks.
*
* Together with the fact that the data, and thus also the Merkle tree,
* cannot have more than ULONG_MAX pages, this implies that hash block
* indices can always fit in an 'unsigned long'. But to be safe, we
* explicitly check for that too. Note, this is only for hash block
* indices; data block indices might not fit in an 'unsigned long'.
*/ */
if (offset > ULONG_MAX) { if ((params->block_size != PAGE_SIZE && offset > 1 << 23) ||
offset > ULONG_MAX) {
fsverity_err(inode, "Too many blocks in Merkle tree"); fsverity_err(inode, "Too many blocks in Merkle tree");
err = -EFBIG; err = -EFBIG;
goto out_err; goto out_err;
...@@ -170,7 +196,7 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode, ...@@ -170,7 +196,7 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
fsverity_err(inode, fsverity_err(inode,
"Error %d initializing Merkle tree parameters", "Error %d initializing Merkle tree parameters",
err); err);
goto out; goto fail;
} }
memcpy(vi->root_hash, desc->root_hash, vi->tree_params.digest_size); memcpy(vi->root_hash, desc->root_hash, vi->tree_params.digest_size);
...@@ -179,17 +205,48 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode, ...@@ -179,17 +205,48 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
vi->file_digest); vi->file_digest);
if (err) { if (err) {
fsverity_err(inode, "Error %d computing file digest", err); fsverity_err(inode, "Error %d computing file digest", err);
goto out; goto fail;
} }
err = fsverity_verify_signature(vi, desc->signature, err = fsverity_verify_signature(vi, desc->signature,
le32_to_cpu(desc->sig_size)); le32_to_cpu(desc->sig_size));
out: if (err)
if (err) { goto fail;
fsverity_free_info(vi);
vi = ERR_PTR(err); if (vi->tree_params.block_size != PAGE_SIZE) {
/*
* When the Merkle tree block size and page size differ, we use
* a bitmap to keep track of which hash blocks have been
* verified. This bitmap must contain one bit per hash block,
* including alignment to a page boundary at the end.
*
* Eventually, to support extremely large files in an efficient
* way, it might be necessary to make pages of this bitmap
* reclaimable. But for now, simply allocating the whole bitmap
* is a simple solution that works well on the files on which
* fsverity is realistically used. E.g., with SHA-256 and 4K
* blocks, a 100MB file only needs a 24-byte bitmap, and the
* bitmap for any file under 17GB fits in a 4K page.
*/
unsigned long num_bits =
vi->tree_params.tree_pages <<
vi->tree_params.log_blocks_per_page;
vi->hash_block_verified = kvcalloc(BITS_TO_LONGS(num_bits),
sizeof(unsigned long),
GFP_KERNEL);
if (!vi->hash_block_verified) {
err = -ENOMEM;
goto fail;
}
spin_lock_init(&vi->hash_page_init_lock);
} }
return vi; return vi;
fail:
fsverity_free_info(vi);
return ERR_PTR(err);
} }
void fsverity_set_info(struct inode *inode, struct fsverity_info *vi) void fsverity_set_info(struct inode *inode, struct fsverity_info *vi)
...@@ -216,6 +273,7 @@ void fsverity_free_info(struct fsverity_info *vi) ...@@ -216,6 +273,7 @@ void fsverity_free_info(struct fsverity_info *vi)
if (!vi) if (!vi)
return; return;
kfree(vi->tree_params.hashstate); kfree(vi->tree_params.hashstate);
kvfree(vi->hash_block_verified);
kmem_cache_free(fsverity_info_cachep, vi); kmem_cache_free(fsverity_info_cachep, vi);
} }
......
This diff is collapsed.
...@@ -170,7 +170,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg); ...@@ -170,7 +170,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
/* verify.c */ /* verify.c */
bool fsverity_verify_page(struct page *page); bool fsverity_verify_blocks(struct page *page, unsigned int len,
unsigned int offset);
void fsverity_verify_bio(struct bio *bio); void fsverity_verify_bio(struct bio *bio);
void fsverity_enqueue_verify_work(struct work_struct *work); void fsverity_enqueue_verify_work(struct work_struct *work);
...@@ -230,7 +231,8 @@ static inline int fsverity_ioctl_read_metadata(struct file *filp, ...@@ -230,7 +231,8 @@ static inline int fsverity_ioctl_read_metadata(struct file *filp,
/* verify.c */ /* verify.c */
static inline bool fsverity_verify_page(struct page *page) static inline bool fsverity_verify_blocks(struct page *page, unsigned int len,
unsigned int offset)
{ {
WARN_ON(1); WARN_ON(1);
return false; return false;
...@@ -248,6 +250,11 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work) ...@@ -248,6 +250,11 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
#endif /* !CONFIG_FS_VERITY */ #endif /* !CONFIG_FS_VERITY */
static inline bool fsverity_verify_page(struct page *page)
{
return fsverity_verify_blocks(page, PAGE_SIZE, 0);
}
/** /**
* fsverity_active() - do reads from the inode need to go through fs-verity? * fsverity_active() - do reads from the inode need to go through fs-verity?
* @inode: inode to check * @inode: inode to check
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment