Merge tag 'afs-netfs-lib-20210426' of...

Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "Use the new netfs lib. Begin the process of overhauling the use of the fscache API by AFS and the introduction of support for features such as Transparent Huge Pages (THPs). - Add some support for THPs, including using core VM helper functions to find details of pages. - Use the ITER_XARRAY I/O iterator to mediate access to the pagecache as this handles THPs and doesn't require allocation of large bvec arrays. - Delegate address_space read/pre-write I/O methods for AFS to the netfs helper library. A method is provided to the library that allows it to issue a read against the server. This includes a change in use for PG_fscache (it now indicates a DIO write in progress from the marked page), so a number of waits need to be deployed for it. - Split the core AFS writeback function to make it easier to modify in future patches to handle writing to the cache. [This might feasibly make more sense moved out into my fscache-iter branch]. I've tested these with "xfstests -g quick" against an AFS volume (xfstests needs patching to make it work). With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul (as can be found on my fscache-iter branch, but that's for a later time). Thanks should go to Marc Dionne and Jeff Altman of AuriStor for exercising the patches in their test farm also" Link: https://lore.kernel.org/lkml/3785063.1619482429@warthog.procyon.org.uk/ * tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use the netfs_write_begin() helper afs: Use new netfs lib read helper API afs: Use the fs operation ops to handle FetchData completion afs: Prepare for use of THPs afs: Extract writeback extension into its own function afs: Wait on PG_fscache before modifying/releasing a page afs: Use ITER_XARRAY for writing afs: Set up the iov_iter before calling afs_extract_data() afs: Log remote unmarshalling errors afs: Don't truncate iter during data fetch afs: Move key to afs_read struct afs: Print the operation debug_id when logging an unexpected data version afs: Pass page into dirty region helpers to provide THP size afs: Disable use of the fscache I/O routines

Merge tag 'afs-netfs-lib-20210426' of...
Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "Use the new netfs lib. Begin the process of overhauling the use of the fscache API by AFS and the introduction of support for features such as Transparent Huge Pages (THPs). - Add some support for THPs, including using core VM helper functions to find details of pages. - Use the ITER_XARRAY I/O iterator to mediate access to the pagecache as this handles THPs and doesn't require allocation of large bvec arrays. - Delegate address_space read/pre-write I/O methods for AFS to the netfs helper library. A method is provided to the library that allows it to issue a read against the server. This includes a change in use for PG_fscache (it now indicates a DIO write in progress from the marked page), so a number of waits need to be deployed for it. - Split the core AFS writeback function to make it easier to modify in future patches to handle writing to the cache. [This might feasibly make more sense moved out into my fscache-iter branch]. I've tested these with "xfstests -g quick" against an AFS volume (xfstests needs patching to make it work). With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul (as can be found on my fscache-iter branch, but that's for a later time). Thanks should go to Marc Dionne and Jeff Altman of AuriStor for exercising the patches in their test farm also" Link: https://lore.kernel.org/lkml/3785063.1619482429@warthog.procyon.org.uk/ * tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use the netfs_write_begin() helper afs: Use new netfs lib read helper API afs: Use the fs operation ops to handle FetchData completion afs: Prepare for use of THPs afs: Extract writeback extension into its own function afs: Wait on PG_fscache before modifying/releasing a page afs: Use ITER_XARRAY for writing afs: Set up the iov_iter before calling afs_extract_data() afs: Log remote unmarshalling errors afs: Don't truncate iter during data fetch afs: Move key to afs_read struct afs: Print the operation debug_id when logging an unexpected data version afs: Pass page into dirty region helpers to provide THP size afs: Disable use of the fscache I/O routines
fafe1e39 · Linus Torvalds · 820c4bae · 3003bbd0 · fafe1e39 · fafe1e39
Commit fafe1e39 authored Apr 27, 2021 by Linus Torvalds
13 changed files
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -4,6 +4,7 @@ config AFS_FS
 	depends on INET
 	select AF_RXRPC
 	select DNS_RESOLVER
+	select NETFS_SUPPORT
 	help
 	  If you say Y here, you will get an experimental Andrew File System
 	  driver. It currently only supports unsecured read-only AFS access.

--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -102,6 +102,35 @@ struct afs_lookup_cookie {
 	struct afs_fid		fids[50];
 };
+/*
+ * Drop the refs that we're holding on the pages we were reading into.  We've
+ * got refs on the first nr_pages pages.
+ */
+static void afs_dir_read_cleanup(struct afs_read *req)
+{
+	struct address_space *mapping = req->vnode->vfs_inode.i_mapping;
+	struct page *page;
+	pgoff_t last = req->nr_pages - 1;
+	XA_STATE(xas, &mapping->i_pages, 0);
+	if (unlikely(!req->nr_pages))
+		return;
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		if (xas_retry(&xas, page))
+			continue;
+		BUG_ON(xa_is_value(page));
+		BUG_ON(PageCompound(page));
+		ASSERTCMP(page->mapping, ==, mapping);
+		put_page(page);
+	}
+	rcu_read_unlock();
+}
 /*
 * check that a directory page is valid
 */
@@ -127,7 +156,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 	qty /= sizeof(union afs_xdr_dir_block);
 	/* check them */
-	dbuf = kmap(page);
+	dbuf = kmap_atomic(page);
 	for (tmp = 0; tmp < qty; tmp++) {
 		if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) {
 			printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n",
@@ -146,7 +175,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 		((u8 *)&dbuf->blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0;
 	}
-	kunmap(page);
+	kunmap_atomic(dbuf);
 checked:
 	afs_stat_v(dvnode, n_read_dir);
@@ -157,35 +186,74 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 }
 /*
- * Check the contents of a directory that we've just read.
+ * Dump the contents of a directory.
 */
-static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req)
+static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req)
 {
 	struct afs_xdr_dir_page *dbuf;
-	unsigned int i, j, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+	struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+	struct page *page;
+	unsigned int i, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+	pgoff_t last = req->nr_pages - 1;
-	for (i = 0; i < req->nr_pages; i++)
+	XA_STATE(xas, &mapping->i_pages, 0);
-		if (!afs_dir_check_page(dvnode, req->pages[i], req->actual_len))
-			goto bad;
-	return true;
-bad:
+	pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx\n",
-	pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx r=%llx\n",
 		dvnode->fid.vid, dvnode->fid.vnode,
-		req->file_size, req->len, req->actual_len, req->remain);
+		req->file_size, req->len, req->actual_len);
-	pr_warn("DIR %llx %x %x %x\n",
+	pr_warn("DIR %llx %x %zx %zx\n",
-		req->pos, req->index, req->nr_pages, req->offset);
+		req->pos, req->nr_pages,
+		req->iter->iov_offset,  iov_iter_count(req->iter));
-	for (i = 0; i < req->nr_pages; i++) {
+	xas_for_each(&xas, page, last) {
-		dbuf = kmap(req->pages[i]);
+		if (xas_retry(&xas, page))
-		for (j = 0; j < qty; j++) {
+			continue;
-			union afs_xdr_dir_block *block = &dbuf->blocks[j];
+		BUG_ON(PageCompound(page));
+		BUG_ON(page->mapping != mapping);
+		dbuf = kmap_atomic(page);
+		for (i = 0; i < qty; i++) {
+			union afs_xdr_dir_block *block = &dbuf->blocks[i];
-			pr_warn("[%02x] %32phN\n", i * qty + j, block);
+			pr_warn("[%02lx] %32phN\n", page->index * qty + i, block);
 		}
-		kunmap(req->pages[i]);
+		kunmap_atomic(dbuf);
 	}
-	return false;
+}
+/*
+ * Check all the pages in a directory.  All the pages are held pinned.
+ */
+static int afs_dir_check(struct afs_vnode *dvnode, struct afs_read *req)
+{
+	struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+	struct page *page;
+	pgoff_t last = req->nr_pages - 1;
+	int ret = 0;
+	XA_STATE(xas, &mapping->i_pages, 0);
+	if (unlikely(!req->nr_pages))
+		return 0;
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		if (xas_retry(&xas, page))
+			continue;
+		BUG_ON(PageCompound(page));
+		BUG_ON(page->mapping != mapping);
+		if (!afs_dir_check_page(dvnode, page, req->file_size)) {
+			afs_dir_dump(dvnode, req);
+			ret = -EIO;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return ret;
 }
 /*
@@ -214,57 +282,57 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 {
 	struct afs_read *req;
 	loff_t i_size;
-	int nr_pages, nr_inline, i, n;
+	int nr_pages, i, n;
-	int ret = -ENOMEM;
+	int ret;
+	_enter("");
-retry:
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&req->usage, 1);
+	req->vnode = dvnode;
+	req->key = key_get(key);
+	req->cleanup = afs_dir_read_cleanup;
+expand:
 	i_size = i_size_read(&dvnode->vfs_inode);
-	if (i_size < 2048)
+	if (i_size < 2048) {
-		return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small));
+		ret = afs_bad(dvnode, afs_file_error_dir_small);
+		goto error;
+	}
 	if (i_size > 2048 * 1024) {
 		trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big);
-		return ERR_PTR(-EFBIG);
+		ret = -EFBIG;
+		goto error;
 	}
 	_enter("%llu", i_size);
-	/* Get a request record to hold the page list.  We want to hold it
-	 * inline if we can, but we don't want to make an order 1 allocation.
-	 */
 	nr_pages = (i_size + PAGE_SIZE - 1) / PAGE_SIZE;
-	nr_inline = nr_pages;
-	if (nr_inline > (PAGE_SIZE - sizeof(*req)) / sizeof(struct page *))
-		nr_inline = 0;
-	req = kzalloc(struct_size(req, array, nr_inline), GFP_KERNEL);
-	if (!req)
-		return ERR_PTR(-ENOMEM);
-	refcount_set(&req->usage, 1);
-	req->nr_pages = nr_pages;
 	req->actual_len = i_size; /* May change */
 	req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
 	req->data_version = dvnode->status.data_version; /* May change */
-	if (nr_inline > 0) {
+	iov_iter_xarray(&req->def_iter, READ, &dvnode->vfs_inode.i_mapping->i_pages,
-		req->pages = req->array;
+			0, i_size);
-	} else {
+	req->iter = &req->def_iter;
-		req->pages = kcalloc(nr_pages, sizeof(struct page *),
-				     GFP_KERNEL);
-		if (!req->pages)
-			goto error;
-	}
-	/* Get a list of all the pages that hold or will hold the directory
+	/* Fill in any gaps that we might find where the memory reclaimer has
-	 * content.  We need to fill in any gaps that we might find where the
+	 * been at work and pin all the pages.  If there are any gaps, we will
-	 * memory reclaimer has been at work.  If there are any gaps, we will
 	 * need to reread the entire directory contents.
 	 */
-	i = 0;
+	i = req->nr_pages;
-	do {
+	while (i < nr_pages) {
+		struct page *pages[8], *page;
 		n = find_get_pages_contig(dvnode->vfs_inode.i_mapping, i,
-					  req->nr_pages - i,
+					  min_t(unsigned int, nr_pages - i,
-					  req->pages + i);
+						ARRAY_SIZE(pages)),
-		_debug("find %u at %u/%u", n, i, req->nr_pages);
+					  pages);
+		_debug("find %u at %u/%u", n, i, nr_pages);
 		if (n == 0) {
 			gfp_t gfp = dvnode->vfs_inode.i_mapping->gfp_mask;
@@ -272,22 +340,24 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 				afs_stat_v(dvnode, n_inval);
 			ret = -ENOMEM;
-			req->pages[i] = __page_cache_alloc(gfp);
+			page = __page_cache_alloc(gfp);
-			if (!req->pages[i])
+			if (!page)
 				goto error;
-			ret = add_to_page_cache_lru(req->pages[i],
+			ret = add_to_page_cache_lru(page,
 						    dvnode->vfs_inode.i_mapping,
 						    i, gfp);
 			if (ret < 0)
 				goto error;
-			attach_page_private(req->pages[i], (void *)1);
+			attach_page_private(page, (void *)1);
-			unlock_page(req->pages[i]);
+			unlock_page(page);
+			req->nr_pages++;
 			i++;
 		} else {
+			req->nr_pages += n;
 			i += n;
 		}
-	} while (i < req->nr_pages);
+	}
 	/* If we're going to reload, we need to lock all the pages to prevent
 	 * races.
@@ -305,18 +375,23 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
 		trace_afs_reload_dir(dvnode);
-		ret = afs_fetch_data(dvnode, key, req);
+		ret = afs_fetch_data(dvnode, req);
 		if (ret < 0)
 			goto error_unlock;
 		task_io_account_read(PAGE_SIZE * req->nr_pages);
-		if (req->len < req->file_size)
+		if (req->len < req->file_size) {
-			goto content_has_grown;
+			/* The content has grown, so we need to expand the
+			 * buffer.
+			 */
+			up_write(&dvnode->validate_lock);
+			goto expand;
+		}
 		/* Validate the data we just read. */
-		ret = -EIO;
+		ret = afs_dir_check(dvnode, req);
-		if (!afs_dir_check_pages(dvnode, req))
+		if (ret < 0)
 			goto error_unlock;
 		// TODO: Trim excess pages
@@ -334,11 +409,6 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 	afs_put_read(req);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
-content_has_grown:
-	up_write(&dvnode->validate_lock);
-	afs_put_read(req);
-	goto retry;
 }
 /*
@@ -448,6 +518,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
 	struct afs_read *req;
 	struct page *page;
 	unsigned blkoff, limit;
+	void __rcu **slot;
 	int ret;
 	_enter("{%lu},%u,,", dir->i_ino, (unsigned)ctx->pos);
@@ -472,9 +543,15 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
 		blkoff = ctx->pos & ~(sizeof(union afs_xdr_dir_block) - 1);
 		/* Fetch the appropriate page from the directory and re-add it
-		 * to the LRU.
+		 * to the LRU.  We have all the pages pinned with an extra ref.
 		 */
-		page = req->pages[blkoff / PAGE_SIZE];
+		rcu_read_lock();
+		page = NULL;
+		slot = radix_tree_lookup_slot(&dvnode->vfs_inode.i_mapping->i_pages,
+					      blkoff / PAGE_SIZE);
+		if (slot)
+			page = radix_tree_deref_slot(slot);
+		rcu_read_unlock();
 		if (!page) {
 			ret = afs_bad(dvnode, afs_file_error_dir_missing_page);
 			break;
@@ -2006,6 +2083,6 @@ static void afs_dir_invalidatepage(struct page *page, unsigned int offset,
 		afs_stat_v(dvnode, n_inval);
 	/* we clean up only if the entire page is being invalidated */
-	if (offset == 0 && length == PAGE_SIZE)
+	if (offset == 0 && length == thp_size(page))
 		detach_page_private(page);
 }
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/mm.h>
+#include <linux/netfs.h>
 #include "internal.h"
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
@@ -22,8 +23,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 			       unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
-static int afs_readpages(struct file *filp, struct address_space *mapping,
+static void afs_readahead(struct readahead_control *ractl);
-			 struct list_head *pages, unsigned nr_pages);
 const struct file_operations afs_file_operations = {
 	.open		= afs_open,
@@ -47,7 +47,7 @@ const struct inode_operations afs_file_inode_operations = {
 const struct address_space_operations afs_fs_aops = {
 	.readpage	= afs_readpage,
-	.readpages	= afs_readpages,
+	.readahead	= afs_readahead,
 	.set_page_dirty	= afs_set_page_dirty,
 	.launder_page	= afs_launder_page,
 	.releasepage	= afs_releasepage,
@@ -183,42 +183,51 @@ int afs_release(struct inode *inode, struct file *file)
 	return ret;
 }
+/*
+ * Allocate a new read record.
+ */
+struct afs_read *afs_alloc_read(gfp_t gfp)
+{
+	struct afs_read *req;
+	req = kzalloc(sizeof(struct afs_read), gfp);
+	if (req)
+		refcount_set(&req->usage, 1);
+	return req;
+}
 /*
 * Dispose of a ref to a read record.
 */
 void afs_put_read(struct afs_read *req)
 {
-	int i;
 	if (refcount_dec_and_test(&req->usage)) {
-		if (req->pages) {
+		if (req->cleanup)
-			for (i = 0; i < req->nr_pages; i++)
+			req->cleanup(req);
-				if (req->pages[i])
+		key_put(req->key);
-					put_page(req->pages[i]);
-			if (req->pages != req->array)
-				kfree(req->pages);
-		}
 		kfree(req);
 	}
 }
-#ifdef CONFIG_AFS_FSCACHE
+static void afs_fetch_data_notify(struct afs_operation *op)
-/*
- * deal with notification that a page was read from the cache
- */
-static void afs_file_readpage_read_complete(struct page *page,
-					    void *data,
-					    int error)
 {
-	_enter("%p,%p,%d", page, data, error);
+	struct afs_read *req = op->fetch.req;
+	struct netfs_read_subrequest *subreq = req->subreq;
-	/* if the read completes with an error, we just unlock the page and let
+	int error = op->error;
-	 * the VM reissue the readpage */
-	if (!error)
+	if (error == -ECONNABORTED)
-		SetPageUptodate(page);
+		error = afs_abort_to_error(op->ac.abort_code);
-	unlock_page(page);
+	req->error = error;
+	if (subreq) {
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+		netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
+		req->subreq = NULL;
+	} else if (req->done) {
+		req->done(req);
+	}
 }
-#endif
 static void afs_fetch_data_success(struct afs_operation *op)
 {
@@ -228,10 +237,12 @@ static void afs_fetch_data_success(struct afs_operation *op)
 	afs_vnode_commit_status(op, &op->file[0]);
 	afs_stat_v(vnode, n_fetches);
 	atomic_long_add(op->fetch.req->actual_len, &op->net->n_fetch_bytes);
+	afs_fetch_data_notify(op);
 }
 static void afs_fetch_data_put(struct afs_operation *op)
 {
+	op->fetch.req->error = op->error;
 	afs_put_read(op->fetch.req);
 }
@@ -240,13 +251,14 @@ static const struct afs_operation_ops afs_fetch_data_operation = {
 	.issue_yfs_rpc	= yfs_fs_fetch_data,
 	.success	= afs_fetch_data_success,
 	.aborted	= afs_check_for_remote_deletion,
+	.failed		= afs_fetch_data_notify,
 	.put		= afs_fetch_data_put,
 };
 /*
 * Fetch file data from the volume.
 */
-int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *req)
+int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 {
 	struct afs_operation *op;
@@ -255,11 +267,14 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *re
 	       vnode->fid.vid,
 	       vnode->fid.vnode,
 	       vnode->fid.unique,
-	       key_serial(key));
+	       key_serial(req->key));
-	op = afs_alloc_operation(key, vnode->volume);
+	op = afs_alloc_operation(req->key, vnode->volume);
-	if (IS_ERR(op))
+	if (IS_ERR(op)) {
+		if (req->subreq)
+			netfs_subreq_terminated(req->subreq, PTR_ERR(op), false);
 		return PTR_ERR(op);
+	}
 	afs_op_set_vnode(op, 0, vnode);
@@ -268,336 +283,103 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *re
 	return afs_do_sync_operation(op);
 }
-/*
+static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
- * read page from file, directory or symlink, given a key to use
- */
-int afs_page_filler(void *data, struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
-	struct afs_vnode *vnode = AFS_FS_I(inode);
+	struct afs_read *fsreq;
-	struct afs_read *req;
-	struct key *key = data;
-	int ret;
-	_enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index);
-	BUG_ON(!PageLocked(page));
+	fsreq = afs_alloc_read(GFP_NOFS);
+	if (!fsreq)
+		return netfs_subreq_terminated(subreq, -ENOMEM, false);
-	ret = -ESTALE;
+	fsreq->subreq	= subreq;
-	if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
+	fsreq->pos	= subreq->start + subreq->transferred;
-		goto error;
+	fsreq->len	= subreq->len   - subreq->transferred;
+	fsreq->key	= subreq->rreq->netfs_priv;
+	fsreq->vnode	= vnode;
+	fsreq->iter	= &fsreq->def_iter;
-	/* is it cached? */
+	iov_iter_xarray(&fsreq->def_iter, READ,
-#ifdef CONFIG_AFS_FSCACHE
+			&fsreq->vnode->vfs_inode.i_mapping->i_pages,
-	ret = fscache_read_or_alloc_page(vnode->cache,
+			fsreq->pos, fsreq->len);
-					 page,
-					 afs_file_readpage_read_complete,
-					 NULL,
-					 GFP_KERNEL);
-#else
-	ret = -ENOBUFS;
-#endif
-	switch (ret) {
-		/* read BIO submitted (page in cache) */
-	case 0:
-		break;
-		/* page not yet cached */
-	case -ENODATA:
-		_debug("cache said ENODATA");
-		goto go_on;
-		/* page will not be cached */
-	case -ENOBUFS:
-		_debug("cache said ENOBUFS");
-		fallthrough;
-	default:
-	go_on:
-		req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
-		if (!req)
-			goto enomem;
-		/* We request a full page.  If the page is a partial one at the
-		 * end of the file, the server will return a short read and the
-		 * unmarshalling code will clear the unfilled space.
-		 */
-		refcount_set(&req->usage, 1);
-		req->pos = (loff_t)page->index << PAGE_SHIFT;
-		req->len = PAGE_SIZE;
-		req->nr_pages = 1;
-		req->pages = req->array;
-		req->pages[0] = page;
-		get_page(page);
-		/* read the contents of the file from the server into the
-		 * page */
-		ret = afs_fetch_data(vnode, key, req);
-		afs_put_read(req);
-		if (ret < 0) {
-			if (ret == -ENOENT) {
-				_debug("got NOENT from server"
-				       " - marking file deleted and stale");
-				set_bit(AFS_VNODE_DELETED, &vnode->flags);
-				ret = -ESTALE;
-			}
-#ifdef CONFIG_AFS_FSCACHE
+	afs_fetch_data(fsreq->vnode, fsreq);
-			fscache_uncache_page(vnode->cache, page);
+}
-#endif
-			BUG_ON(PageFsCache(page));
-			if (ret == -EINTR ||
-			    ret == -ENOMEM ||
-			    ret == -ERESTARTSYS ||
-			    ret == -EAGAIN)
-				goto error;
-			goto io_error;
-		}
-		SetPageUptodate(page);
+static int afs_symlink_readpage(struct page *page)
+{
+	struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+	struct afs_read *fsreq;
+	int ret;
-		/* send the page to the cache */
+	fsreq = afs_alloc_read(GFP_NOFS);
-#ifdef CONFIG_AFS_FSCACHE
+	if (!fsreq)
-		if (PageFsCache(page) &&
+		return -ENOMEM;
-		    fscache_write_page(vnode->cache, page, vnode->status.size,
-				       GFP_KERNEL) != 0) {
-			fscache_uncache_page(vnode->cache, page);
-			BUG_ON(PageFsCache(page));
-		}
-#endif
-		unlock_page(page);
-	}
-	_leave(" = 0");
+	fsreq->pos	= page->index * PAGE_SIZE;
-	return 0;
+	fsreq->len	= PAGE_SIZE;
+	fsreq->vnode	= vnode;
+	fsreq->iter	= &fsreq->def_iter;
+	iov_iter_xarray(&fsreq->def_iter, READ, &page->mapping->i_pages,
+			fsreq->pos, fsreq->len);
-io_error:
+	ret = afs_fetch_data(fsreq->vnode, fsreq);
-	SetPageError(page);
+	page_endio(page, false, ret);
-	goto error;
-enomem:
-	ret = -ENOMEM;
-error:
-	unlock_page(page);
-	_leave(" = %d", ret);
 	return ret;
 }
-/*
+static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file)
- * read page from file, directory or symlink, given a file to nominate the key
- * to be used
- */
-static int afs_readpage(struct file *file, struct page *page)
 {
-	struct key *key;
+	rreq->netfs_priv = key_get(afs_file_key(file));
-	int ret;
-	if (file) {
-		key = afs_file_key(file);
-		ASSERT(key != NULL);
-		ret = afs_page_filler(key, page);
-	} else {
-		struct inode *inode = page->mapping->host;
-		key = afs_request_key(AFS_FS_S(inode->i_sb)->cell);
-		if (IS_ERR(key)) {
-			ret = PTR_ERR(key);
-		} else {
-			ret = afs_page_filler(key, page);
-			key_put(key);
-		}
-	}
-	return ret;
 }
-/*
+static bool afs_is_cache_enabled(struct inode *inode)
- * Make pages available as they're filled.
- */
-static void afs_readpages_page_done(struct afs_read *req)
 {
-#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode));
-	struct afs_vnode *vnode = req->vnode;
-#endif
-	struct page *page = req->pages[req->index];
-	req->pages[req->index] = NULL;
+	return fscache_cookie_enabled(cookie) && !hlist_empty(&cookie->backing_objects);
-	SetPageUptodate(page);
-	/* send the page to the cache */
-#ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(page) &&
-	    fscache_write_page(vnode->cache, page, vnode->status.size,
-			       GFP_KERNEL) != 0) {
-		fscache_uncache_page(vnode->cache, page);
-		BUG_ON(PageFsCache(page));
-	}
-#endif
-	unlock_page(page);
-	put_page(page);
 }
-/*
+static int afs_begin_cache_operation(struct netfs_read_request *rreq)
- * Read a contiguous set of pages.
- */
-static int afs_readpages_one(struct file *file, struct address_space *mapping,
-			     struct list_head *pages)
 {
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
-	struct afs_read *req;
-	struct list_head *p;
-	struct page *first, *page;
-	struct key *key = afs_file_key(file);
-	pgoff_t index;
-	int ret, n, i;
-	/* Count the number of contiguous pages at the front of the list.  Note
-	 * that the list goes prev-wards rather than next-wards.
-	 */
-	first = lru_to_page(pages);
-	index = first->index + 1;
-	n = 1;
-	for (p = first->lru.prev; p != pages; p = p->prev) {
-		page = list_entry(p, struct page, lru);
-		if (page->index != index)
-			break;
-		index++;
-		n++;
-	}
-	req = kzalloc(struct_size(req, array, n), GFP_NOFS);
-	if (!req)
-		return -ENOMEM;
-	refcount_set(&req->usage, 1);
-	req->vnode = vnode;
-	req->page_done = afs_readpages_page_done;
-	req->pos = first->index;
-	req->pos <<= PAGE_SHIFT;
-	req->pages = req->array;
-	/* Transfer the pages to the request.  We add them in until one fails
-	 * to add to the LRU and then we stop (as that'll make a hole in the
-	 * contiguous run.
-	 *
-	 * Note that it's possible for the file size to change whilst we're
-	 * doing this, but we rely on the server returning less than we asked
-	 * for if the file shrank.  We also rely on this to deal with a partial
-	 * page at the end of the file.
-	 */
-	do {
-		page = lru_to_page(pages);
-		list_del(&page->lru);
-		index = page->index;
-		if (add_to_page_cache_lru(page, mapping, index,
-					  readahead_gfp_mask(mapping))) {
-#ifdef CONFIG_AFS_FSCACHE
-			fscache_uncache_page(vnode->cache, page);
-#endif
-			put_page(page);
-			break;
-		}
-		req->pages[req->nr_pages++] = page;
-		req->len += PAGE_SIZE;
-	} while (req->nr_pages < n);
-	if (req->nr_pages == 0) {
+	return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode));
-		kfree(req);
-		return 0;
-	}
-	ret = afs_fetch_data(vnode, key, req);
-	if (ret < 0)
-		goto error;
-	task_io_account_read(PAGE_SIZE * req->nr_pages);
-	afs_put_read(req);
-	return 0;
-error:
-	if (ret == -ENOENT) {
-		_debug("got NOENT from server"
-		       " - marking file deleted and stale");
-		set_bit(AFS_VNODE_DELETED, &vnode->flags);
-		ret = -ESTALE;
-	}
-	for (i = 0; i < req->nr_pages; i++) {
-		page = req->pages[i];
-		if (page) {
-#ifdef CONFIG_AFS_FSCACHE
-			fscache_uncache_page(vnode->cache, page);
-#endif
-			SetPageError(page);
-			unlock_page(page);
-		}
-	}
-	afs_put_read(req);
-	return ret;
 }
-/*
+static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
- * read a set of pages
+				 struct page *page, void **_fsdata)
- */
-static int afs_readpages(struct file *file, struct address_space *mapping,
-			 struct list_head *pages, unsigned nr_pages)
 {
-	struct key *key = afs_file_key(file);
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-	struct afs_vnode *vnode;
-	int ret = 0;
-	_enter("{%d},{%lu},,%d",
-	       key_serial(key), mapping->host->i_ino, nr_pages);
-	ASSERT(key != NULL);
+	return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
+}
-	vnode = AFS_FS_I(mapping->host);
+static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
-	if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) {
+{
-		_leave(" = -ESTALE");
+	key_put(netfs_priv);
-		return -ESTALE;
+}
-	}
-	/* attempt to read as many of the pages as possible */
+const struct netfs_read_request_ops afs_req_ops = {
-#ifdef CONFIG_AFS_FSCACHE
+	.init_rreq		= afs_init_rreq,
-	ret = fscache_read_or_alloc_pages(vnode->cache,
+	.is_cache_enabled	= afs_is_cache_enabled,
-					  mapping,
+	.begin_cache_operation	= afs_begin_cache_operation,
-					  pages,
+	.check_write_begin	= afs_check_write_begin,
-					  &nr_pages,
+	.issue_op		= afs_req_issue_op,
-					  afs_file_readpage_read_complete,
+	.cleanup		= afs_priv_cleanup,
-					  NULL,
+};
-					  mapping_gfp_mask(mapping));
-#else
-	ret = -ENOBUFS;
-#endif
-	switch (ret) {
+static int afs_readpage(struct file *file, struct page *page)
-		/* all pages are being read from the cache */
+{
-	case 0:
+	if (!file)
-		BUG_ON(!list_empty(pages));
+		return afs_symlink_readpage(page);
-		BUG_ON(nr_pages != 0);
-		_leave(" = 0 [reading all]");
-		return 0;
-		/* there were pages that couldn't be read from the cache */
-	case -ENODATA:
-	case -ENOBUFS:
-		break;
-		/* other error */
-	default:
-		_leave(" = %d", ret);
-		return ret;
-	}
-	while (!list_empty(pages)) {
+	return netfs_readpage(file, page, &afs_req_ops, NULL);
-		ret = afs_readpages_one(file, mapping, pages);
+}
-		if (ret < 0)
-			break;
-	}
-	_leave(" = %d [netting]", ret);
+static void afs_readahead(struct readahead_control *ractl)
-	return ret;
+{
+	netfs_readahead(ractl, &afs_req_ops, NULL);
 }
 /*
@@ -625,8 +407,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
 		return;
 	/* We may need to shorten the dirty region */
-	f = afs_page_dirty_from(priv);
+	f = afs_page_dirty_from(page, priv);
-	t = afs_page_dirty_to(priv);
+	t = afs_page_dirty_to(page, priv);
 	if (t <= offset || f >= end)
 		return; /* Doesn't overlap */
@@ -644,17 +426,17 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
 	if (f == t)
 		goto undirty;
-	priv = afs_page_dirty(f, t);
+	priv = afs_page_dirty(page, f, t);
 	set_page_private(page, priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page);
 	return;
 undirty:
-	trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
 	clear_page_dirty_for_io(page);
 full_invalidate:
-	priv = (unsigned long)detach_page_private(page);
+	trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
-	trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, priv);
+	detach_page_private(page);
 }
 /*
@@ -669,20 +451,10 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 	BUG_ON(!PageLocked(page));
-#ifdef CONFIG_AFS_FSCACHE
-	/* we clean up only if the entire page is being invalidated */
-	if (offset == 0 && length == PAGE_SIZE) {
-		if (PageFsCache(page)) {
-			struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-			fscache_wait_on_page_write(vnode->cache, page);
-			fscache_uncache_page(vnode->cache, page);
-		}
-	}
-#endif
 	if (PagePrivate(page))
 		afs_invalidate_dirty(page, offset, length);
+	wait_on_page_fscache(page);
 	_leave("");
 }
@@ -693,7 +465,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-	unsigned long priv;
 	_enter("{{%llx:%llu}[%lu],%lx},%x",
 	       vnode->fid.vid, vnode->fid.vnode, page->index, page->flags,
@@ -702,16 +473,16 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
 #ifdef CONFIG_AFS_FSCACHE
-	if (!fscache_maybe_release_page(vnode->cache, page, gfp_flags)) {
+	if (PageFsCache(page)) {
-		_leave(" = F [cache busy]");
+		if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
-		return 0;
+			return false;
+		wait_on_page_fscache(page);
 	}
 #endif
 	if (PagePrivate(page)) {
-		priv = (unsigned long)detach_page_private(page);
+		trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
-		trace_afs_page_dirty(vnode, tracepoint_string("rel"),
+		detach_page_private(page);
-				     page->index, priv);
 	}
 	/* indicate that the page can be released */

--- a/fs/afs/fs_operation.c
+++ b/fs/afs/fs_operation.c
@@ -198,8 +198,10 @@ void afs_wait_for_operation(struct afs_operation *op)
 	case -ECONNABORTED:
 		if (op->ops->aborted)
 			op->ops->aborted(op);
-		break;
+		fallthrough;
 	default:
+		if (op->ops->failed)
+			op->ops->failed(op);
 		break;
 	}

--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -10,6 +10,7 @@
 #include <linux/sched.h>
 #include <linux/circ_buf.h>
 #include <linux/iversion.h>
+#include <linux/netfs.h>
 #include "internal.h"
 #include "afs_fs.h"
 #include "xdr_fs.h"
@@ -302,17 +303,15 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_read *req = op->fetch.req;
 	const __be32 *bp;
-	unsigned int size;
 	int ret;
-	_enter("{%u,%zu/%llu}",
+	_enter("{%u,%zu,%zu/%llu}",
-	       call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+	       call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+	       req->actual_len);
 	switch (call->unmarshall) {
 	case 0:
 		req->actual_len = 0;
-		req->index = 0;
-		req->offset = req->pos & (PAGE_SIZE - 1);
 		call->unmarshall++;
 		if (call->operation_ID == FSFETCHDATA64) {
 			afs_extract_to_tmp64(call);
@@ -322,7 +321,10 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		}
 		fallthrough;
-		/* extract the returned data length */
+		/* Extract the returned data length into
+		 * ->actual_len.  This may indicate more or less data than was
+		 * requested will be returned.
+		 */
 	case 1:
 		_debug("extract data length");
 		ret = afs_extract_data(call, true);
@@ -331,44 +333,25 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		req->actual_len = be64_to_cpu(call->tmp64);
 		_debug("DATA length: %llu", req->actual_len);
-		req->remain = min(req->len, req->actual_len);
-		if (req->remain == 0)
+		if (req->actual_len == 0)
 			goto no_more_data;
+		call->iter = req->iter;
+		call->iov_len = min(req->actual_len, req->len);
 		call->unmarshall++;
-	begin_page:
-		ASSERTCMP(req->index, <, req->nr_pages);
-		if (req->remain > PAGE_SIZE - req->offset)
-			size = PAGE_SIZE - req->offset;
-		else
-			size = req->remain;
-		call->bvec[0].bv_len = size;
-		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
-		iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size);
-		ASSERTCMP(size, <=, PAGE_SIZE);
 		fallthrough;
 		/* extract the returned data */
 	case 2:
 		_debug("extract data %zu/%llu",
-		       iov_iter_count(call->iter), req->remain);
+		       iov_iter_count(call->iter), req->actual_len);
 		ret = afs_extract_data(call, true);
 		if (ret < 0)
 			return ret;
-		req->remain -= call->bvec[0].bv_len;
-		req->offset += call->bvec[0].bv_len;
-		ASSERTCMP(req->offset, <=, PAGE_SIZE);
-		if (req->offset == PAGE_SIZE) {
-			req->offset = 0;
-			req->index++;
-			if (req->remain > 0)
-				goto begin_page;
-		}
-		ASSERTCMP(req->remain, ==, 0);
+		call->iter = &call->def_iter;
 		if (req->actual_len <= req->len)
 			goto no_more_data;
@@ -410,17 +393,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		break;
 	}
-	for (; req->index < req->nr_pages; req->index++) {
-		if (req->offset < PAGE_SIZE)
-			zero_user_segment(req->pages[req->index],
-					  req->offset, PAGE_SIZE);
-		req->offset = 0;
-	}
-	if (req->page_done)
-		for (req->index = 0; req->index < req->nr_pages; req->index++)
-			req->page_done(req);
 	_leave(" = 0 [done]");
 	return 0;
 }
@@ -494,6 +466,8 @@ void afs_fs_fetch_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
+	req->call_debug_id = call->debug_id;
 	/* marshall the parameters */
 	bp = call->request;
 	bp[0] = htonl(FSFETCHDATA);
@@ -1079,8 +1053,7 @@ static const struct afs_call_type afs_RXFSStoreData64 = {
 /*
 * store a set of pages to a very large file
 */
-static void afs_fs_store_data64(struct afs_operation *op,
+static void afs_fs_store_data64(struct afs_operation *op)
-				loff_t pos, loff_t size, loff_t i_size)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
@@ -1095,7 +1068,7 @@ static void afs_fs_store_data64(struct afs_operation *op,
 	if (!call)
 		return afs_op_nomem(op);
-	call->send_pages = true;
+	call->write_iter = op->store.write_iter;
 	/* marshall the parameters */
 	bp = call->request;
@@ -1111,47 +1084,38 @@ static void afs_fs_store_data64(struct afs_operation *op,
 	*bp++ = 0; /* unix mode */
 	*bp++ = 0; /* segment size */
-	*bp++ = htonl(upper_32_bits(pos));
+	*bp++ = htonl(upper_32_bits(op->store.pos));
-	*bp++ = htonl(lower_32_bits(pos));
+	*bp++ = htonl(lower_32_bits(op->store.pos));
-	*bp++ = htonl(upper_32_bits(size));
+	*bp++ = htonl(upper_32_bits(op->store.size));
-	*bp++ = htonl(lower_32_bits(size));
+	*bp++ = htonl(lower_32_bits(op->store.size));
-	*bp++ = htonl(upper_32_bits(i_size));
+	*bp++ = htonl(upper_32_bits(op->store.i_size));
-	*bp++ = htonl(lower_32_bits(i_size));
+	*bp++ = htonl(lower_32_bits(op->store.i_size));
 	trace_afs_make_fs_call(call, &vp->fid);
 	afs_make_op_call(op, call, GFP_NOFS);
 }
 /*
- * store a set of pages
+ * Write data to a file on the server.
 */
 void afs_fs_store_data(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
-	loff_t size, pos, i_size;
 	__be32 *bp;
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
-	size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-	if (op->store.first != op->store.last)
-		size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT;
-	pos = (loff_t)op->store.first << PAGE_SHIFT;
-	pos += op->store.first_offset;
-	i_size = i_size_read(&vp->vnode->vfs_inode);
-	if (pos + size > i_size)
-		i_size = size + pos;
 	_debug("size %llx, at %llx, i_size %llx",
-	       (unsigned long long) size, (unsigned long long) pos,
+	       (unsigned long long)op->store.size,
-	       (unsigned long long) i_size);
+	       (unsigned long long)op->store.pos,
+	       (unsigned long long)op->store.i_size);
-	if (upper_32_bits(pos) || upper_32_bits(i_size) || upper_32_bits(size) ||
+	if (upper_32_bits(op->store.pos) ||
-	    upper_32_bits(pos + size))
+	    upper_32_bits(op->store.size) ||
-		return afs_fs_store_data64(op, pos, size, i_size);
+	    upper_32_bits(op->store.i_size))
+		return afs_fs_store_data64(op);
 	call = afs_alloc_flat_call(op->net, &afs_RXFSStoreData,
 				   (4 + 6 + 3) * 4,
@@ -1159,7 +1123,7 @@ void afs_fs_store_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
-	call->send_pages = true;
+	call->write_iter = op->store.write_iter;
 	/* marshall the parameters */
 	bp = call->request;
@@ -1175,9 +1139,9 @@ void afs_fs_store_data(struct afs_operation *op)
 	*bp++ = 0; /* unix mode */
 	*bp++ = 0; /* segment size */
-	*bp++ = htonl(lower_32_bits(pos));
+	*bp++ = htonl(lower_32_bits(op->store.pos));
-	*bp++ = htonl(lower_32_bits(size));
+	*bp++ = htonl(lower_32_bits(op->store.size));
-	*bp++ = htonl(lower_32_bits(i_size));
+	*bp++ = htonl(lower_32_bits(op->store.i_size));
 	trace_afs_make_fs_call(call, &vp->fid);
 	afs_make_op_call(op, call, GFP_NOFS);

--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -214,11 +214,12 @@ static void afs_apply_status(struct afs_operation *op,
 	if (vp->dv_before + vp->dv_delta != status->data_version) {
 		if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags))
-			pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s\n",
+			pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s (op=%x)\n",
 				vnode->fid.vid, vnode->fid.vnode,
 				(unsigned long long)vp->dv_before + vp->dv_delta,
 				(unsigned long long)status->data_version,
-				op->type ? op->type->name : "???");
+				op->type ? op->type->name : "???",
+				op->debug_id);
 		vnode->invalid_before = status->data_version;
 		if (vnode->status.type == AFS_FTYPE_DIR) {
@@ -427,7 +428,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 	} __packed key;
 	struct afs_vnode_cache_aux aux;
-	if (vnode->status.type == AFS_FTYPE_DIR) {
+	if (vnode->status.type != AFS_FTYPE_FILE) {
 		vnode->cache = NULL;
 		return;
 	}

--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -14,6 +14,7 @@
 #include <linux/key.h>
 #include <linux/workqueue.h>
 #include <linux/sched.h>
+#define FSCACHE_USE_NEW_IO_API
 #include <linux/fscache.h>
 #include <linux/backing-dev.h>
 #include <linux/uuid.h>
@@ -31,6 +32,7 @@
 struct pagevec;
 struct afs_call;
+struct afs_vnode;
 /*
 * Partial file-locking emulation mode.  (The problem being that AFS3 only
@@ -104,7 +106,9 @@ struct afs_call {
 	struct afs_server	*server;	/* The fileserver record if fs op (pins ref) */
 	struct afs_vlserver	*vlserver;	/* The vlserver record if vl op */
 	void			*request;	/* request data (first part) */
+	size_t			iov_len;	/* Size of *iter to be used */
 	struct iov_iter		def_iter;	/* Default buffer/data iterator */
+	struct iov_iter		*write_iter;	/* Iterator defining write to be made */
 	struct iov_iter		*iter;		/* Iterator currently in use */
 	union {	/* Convenience for ->def_iter */
 		struct kvec	kvec[1];
@@ -131,7 +135,6 @@ struct afs_call {
 	unsigned char		unmarshall;	/* unmarshalling phase */
 	unsigned char		addr_ix;	/* Address in ->alist */
 	bool			drop_ref;	/* T if need to drop ref for incoming call */
-	bool			send_pages;	/* T if data from mapping should be sent */
 	bool			need_attention;	/* T if RxRPC poked us */
 	bool			async;		/* T if asynchronous */
 	bool			upgrade;	/* T to request service upgrade */
@@ -202,17 +205,19 @@ struct afs_read {
 	loff_t			pos;		/* Where to start reading */
 	loff_t			len;		/* How much we're asking for */
 	loff_t			actual_len;	/* How much we're actually getting */
-	loff_t			remain;		/* Amount remaining */
 	loff_t			file_size;	/* File size returned by server */
+	struct key		*key;		/* The key to use to reissue the read */
+	struct afs_vnode	*vnode;		/* The file being read into. */
+	struct netfs_read_subrequest *subreq;	/* Fscache helper read request this belongs to */
 	afs_dataversion_t	data_version;	/* Version number returned by server */
 	refcount_t		usage;
-	unsigned int		index;		/* Which page we're reading into */
+	unsigned int		call_debug_id;
 	unsigned int		nr_pages;
-	unsigned int		offset;		/* offset into current page */
+	int			error;
-	struct afs_vnode	*vnode;
+	void (*done)(struct afs_read *);
-	void (*page_done)(struct afs_read *);
+	void (*cleanup)(struct afs_read *);
-	struct page		**pages;
+	struct iov_iter		*iter;		/* Iterator representing the buffer */
-	struct page		*array[];
+	struct iov_iter		def_iter;	/* Default iterator */
 };
 /*
@@ -739,6 +744,7 @@ struct afs_operation_ops {
 	void (*issue_yfs_rpc)(struct afs_operation *op);
 	void (*success)(struct afs_operation *op);
 	void (*aborted)(struct afs_operation *op);
+	void (*failed)(struct afs_operation *op);
 	void (*edit_dir)(struct afs_operation *op);
 	void (*put)(struct afs_operation *op);
 };
@@ -808,12 +814,11 @@ struct afs_operation {
 			afs_lock_type_t type;
 		} lock;
 		struct {
-			struct address_space *mapping;	/* Pages being written from */
+			struct iov_iter	*write_iter;
-			pgoff_t		first;		/* first page in mapping to deal with */
+			loff_t	pos;
-			pgoff_t		last;		/* last page in mapping to deal with */
+			loff_t	size;
-			unsigned	first_offset;	/* offset into mapping[first] */
+			loff_t	i_size;
-			unsigned	last_to;	/* amount of mapping[last] */
+			bool	laundering;	/* Laundering page, PG_writeback not set */
-			bool		laundering;	/* Laundering page, PG_writeback not set */
 		} store;
 		struct {
 			struct iattr	*attr;
@@ -875,31 +880,31 @@ struct afs_vnode_cache_aux {
 #define __AFS_PAGE_PRIV_MMAPPED	0x8000UL
 #endif
-static inline unsigned int afs_page_dirty_resolution(void)
+static inline unsigned int afs_page_dirty_resolution(struct page *page)
 {
-	int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
+	int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
 	return (shift > 0) ? shift : 0;
 }
-static inline size_t afs_page_dirty_from(unsigned long priv)
+static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv)
 {
 	unsigned long x = priv & __AFS_PAGE_PRIV_MASK;
 	/* The lower bound is inclusive */
-	return x << afs_page_dirty_resolution();
+	return x << afs_page_dirty_resolution(page);
 }
-static inline size_t afs_page_dirty_to(unsigned long priv)
+static inline size_t afs_page_dirty_to(struct page *page, unsigned long priv)
 {
 	unsigned long x = (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
 	/* The upper bound is immediately beyond the region */
-	return (x + 1) << afs_page_dirty_resolution();
+	return (x + 1) << afs_page_dirty_resolution(page);
 }
-static inline unsigned long afs_page_dirty(size_t from, size_t to)
+static inline unsigned long afs_page_dirty(struct page *page, size_t from, size_t to)
 {
-	unsigned int res = afs_page_dirty_resolution();
+	unsigned int res = afs_page_dirty_resolution(page);
 	from >>= res;
 	to = (to - 1) >> res;
 	return (to << __AFS_PAGE_PRIV_SHIFT) | from;
@@ -1040,13 +1045,14 @@ extern void afs_dynroot_depopulate(struct super_block *);
 extern const struct address_space_operations afs_fs_aops;
 extern const struct inode_operations afs_file_inode_operations;
 extern const struct file_operations afs_file_operations;
+extern const struct netfs_read_request_ops afs_req_ops;
 extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
 extern void afs_put_wb_key(struct afs_wb_key *);
 extern int afs_open(struct inode *, struct file *);
 extern int afs_release(struct inode *, struct file *);
-extern int afs_fetch_data(struct afs_vnode *, struct key *, struct afs_read *);
+extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
-extern int afs_page_filler(void *, struct page *);
+extern struct afs_read *afs_alloc_read(gfp_t);
 extern void afs_put_read(struct afs_read *);
 static inline struct afs_read *afs_get_read(struct afs_read *req)
@@ -1270,6 +1276,7 @@ static inline void afs_make_op_call(struct afs_operation *op, struct afs_call *c
 static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t size)
 {
+	call->iov_len = size;
 	call->kvec[0].iov_base = buf;
 	call->kvec[0].iov_len = size;
 	iov_iter_kvec(&call->def_iter, READ, call->kvec, 1, size);
@@ -1277,21 +1284,25 @@ static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t si
 static inline void afs_extract_to_tmp(struct afs_call *call)
 {
+	call->iov_len = sizeof(call->tmp);
 	afs_extract_begin(call, &call->tmp, sizeof(call->tmp));
 }
 static inline void afs_extract_to_tmp64(struct afs_call *call)
 {
+	call->iov_len = sizeof(call->tmp64);
 	afs_extract_begin(call, &call->tmp64, sizeof(call->tmp64));
 }
 static inline void afs_extract_discard(struct afs_call *call, size_t size)
 {
+	call->iov_len = size;
 	iov_iter_discard(&call->def_iter, READ, size);
 }
 static inline void afs_extract_to_buf(struct afs_call *call, size_t size)
 {
+	call->iov_len = size;
 	afs_extract_begin(call, call->buffer, size);
 }

--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -271,40 +271,6 @@ void afs_flat_call_destructor(struct afs_call *call)
 	call->buffer = NULL;
 }
-#define AFS_BVEC_MAX 8
-/*
- * Load the given bvec with the next few pages.
- */
-static void afs_load_bvec(struct afs_call *call, struct msghdr *msg,
-			  struct bio_vec *bv, pgoff_t first, pgoff_t last,
-			  unsigned offset)
-{
-	struct afs_operation *op = call->op;
-	struct page *pages[AFS_BVEC_MAX];
-	unsigned int nr, n, i, to, bytes = 0;
-	nr = min_t(pgoff_t, last - first + 1, AFS_BVEC_MAX);
-	n = find_get_pages_contig(op->store.mapping, first, nr, pages);
-	ASSERTCMP(n, ==, nr);
-	msg->msg_flags |= MSG_MORE;
-	for (i = 0; i < nr; i++) {
-		to = PAGE_SIZE;
-		if (first + i >= last) {
-			to = op->store.last_to;
-			msg->msg_flags &= ~MSG_MORE;
-		}
-		bv[i].bv_page = pages[i];
-		bv[i].bv_len = to - offset;
-		bv[i].bv_offset = offset;
-		bytes += to - offset;
-		offset = 0;
-	}
-	iov_iter_bvec(&msg->msg_iter, WRITE, bv, nr, bytes);
-}
 /*
 * Advance the AFS call state when the RxRPC call ends the transmit phase.
 */
@@ -317,42 +283,6 @@ static void afs_notify_end_request_tx(struct sock *sock,
 	afs_set_call_state(call, AFS_CALL_CL_REQUESTING, AFS_CALL_CL_AWAIT_REPLY);
 }
-/*
- * attach the data from a bunch of pages on an inode to a call
- */
-static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
-{
-	struct afs_operation *op = call->op;
-	struct bio_vec bv[AFS_BVEC_MAX];
-	unsigned int bytes, nr, loop, offset;
-	pgoff_t first = op->store.first, last = op->store.last;
-	int ret;
-	offset = op->store.first_offset;
-	op->store.first_offset = 0;
-	do {
-		afs_load_bvec(call, msg, bv, first, last, offset);
-		trace_afs_send_pages(call, msg, first, last, offset);
-		offset = 0;
-		bytes = msg->msg_iter.count;
-		nr = msg->msg_iter.nr_segs;
-		ret = rxrpc_kernel_send_data(op->net->socket, call->rxcall, msg,
-					     bytes, afs_notify_end_request_tx);
-		for (loop = 0; loop < nr; loop++)
-			put_page(bv[loop].bv_page);
-		if (ret < 0)
-			break;
-		first += nr;
-	} while (first <= last);
-	trace_afs_sent_pages(call, op->store.first, last, first, ret);
-	return ret;
-}
 /*
 * Initiate a call and synchronously queue up the parameters for dispatch.  Any
 * error is stored into the call struct, which the caller must check for.
@@ -363,6 +293,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	struct rxrpc_call *rxcall;
 	struct msghdr msg;
 	struct kvec iov[1];
+	size_t len;
 	s64 tx_total_len;
 	int ret;
@@ -383,21 +314,8 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	 * after the initial fixed part.
 	 */
 	tx_total_len = call->request_size;
-	if (call->send_pages) {
+	if (call->write_iter)
-		struct afs_operation *op = call->op;
+		tx_total_len += iov_iter_count(call->write_iter);
-		if (op->store.last == op->store.first) {
-			tx_total_len += op->store.last_to - op->store.first_offset;
-		} else {
-			/* It looks mathematically like you should be able to
-			 * combine the following lines with the ones above, but
-			 * unsigned arithmetic is fun when it wraps...
-			 */
-			tx_total_len += PAGE_SIZE - op->store.first_offset;
-			tx_total_len += op->store.last_to;
-			tx_total_len += (op->store.last - op->store.first - 1) * PAGE_SIZE;
-		}
-	}
 	/* If the call is going to be asynchronous, we need an extra ref for
 	 * the call to hold itself so the caller need not hang on to its ref.
@@ -439,7 +357,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, call->request_size);
 	msg.msg_control		= NULL;
 	msg.msg_controllen	= 0;
-	msg.msg_flags		= MSG_WAITALL | (call->send_pages ? MSG_MORE : 0);
+	msg.msg_flags		= MSG_WAITALL | (call->write_iter ? MSG_MORE : 0);
 	ret = rxrpc_kernel_send_data(call->net->socket, rxcall,
 				     &msg, call->request_size,
@@ -447,8 +365,18 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	if (ret < 0)
 		goto error_do_abort;
-	if (call->send_pages) {
+	if (call->write_iter) {
-		ret = afs_send_pages(call, &msg);
+		msg.msg_iter = *call->write_iter;
+		msg.msg_flags &= ~MSG_MORE;
+		trace_afs_send_data(call, &msg);
+		ret = rxrpc_kernel_send_data(call->net->socket,
+					     call->rxcall, &msg,
+					     iov_iter_count(&msg.msg_iter),
+					     afs_notify_end_request_tx);
+		*call->write_iter = msg.msg_iter;
+		trace_afs_sent_data(call, &msg, ret);
 		if (ret < 0)
 			goto error_do_abort;
 	}
@@ -466,9 +394,10 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 		rxrpc_kernel_abort_call(call->net->socket, rxcall,
 					RX_USER_ABORT, ret, "KSD");
 	} else {
+		len = 0;
 		iov_iter_kvec(&msg.msg_iter, READ, NULL, 0, 0);
 		rxrpc_kernel_recv_data(call->net->socket, rxcall,
-				       &msg.msg_iter, false,
+				       &msg.msg_iter, &len, false,
 				       &call->abort_code, &call->service_id);
 		ac->abort_code = call->abort_code;
 		ac->responded = true;
@@ -498,12 +427,46 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	_leave(" = %d", ret);
 }
+/*
+ * Log remote abort codes that indicate that we have a protocol disagreement
+ * with the server.
+ */
+static void afs_log_error(struct afs_call *call, s32 remote_abort)
+{
+	static int max = 0;
+	const char *msg;
+	int m;
+	switch (remote_abort) {
+	case RX_EOF:		 msg = "unexpected EOF";	break;
+	case RXGEN_CC_MARSHAL:	 msg = "client marshalling";	break;
+	case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling";	break;
+	case RXGEN_SS_MARSHAL:	 msg = "server marshalling";	break;
+	case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling";	break;
+	case RXGEN_DECODE:	 msg = "opcode decode";		break;
+	case RXGEN_SS_XDRFREE:	 msg = "server XDR cleanup";	break;
+	case RXGEN_CC_XDRFREE:	 msg = "client XDR cleanup";	break;
+	case -32:		 msg = "insufficient data";	break;
+	default:
+		return;
+	}
+	m = max;
+	if (m < 3) {
+		max = m + 1;
+		pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n",
+			  msg, call->type->name,
+			  &call->alist->addrs[call->addr_ix].transport);
+	}
+}
 /*
 * deliver messages to a call
 */
 static void afs_deliver_to_call(struct afs_call *call)
 {
 	enum afs_call_state state;
+	size_t len;
 	u32 abort_code, remote_abort = 0;
 	int ret;
@@ -516,10 +479,11 @@ static void afs_deliver_to_call(struct afs_call *call)
 	       state == AFS_CALL_SV_AWAIT_ACK
 	       ) {
 		if (state == AFS_CALL_SV_AWAIT_ACK) {
+			len = 0;
 			iov_iter_kvec(&call->def_iter, READ, NULL, 0, 0);
 			ret = rxrpc_kernel_recv_data(call->net->socket,
 						     call->rxcall, &call->def_iter,
-						     false, &remote_abort,
+						     &len, false, &remote_abort,
 						     &call->service_id);
 			trace_afs_receive_data(call, &call->def_iter, false, ret);
@@ -559,6 +523,7 @@ static void afs_deliver_to_call(struct afs_call *call)
 			goto out;
 		case -ECONNABORTED:
 			ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
+			afs_log_error(call, call->abort_code);
 			goto done;
 		case -ENOTSUPP:
 			abort_code = RXGEN_OPCODE;
@@ -929,10 +894,11 @@ int afs_extract_data(struct afs_call *call, bool want_more)
 	u32 remote_abort = 0;
 	int ret;
-	_enter("{%s,%zu},%d", call->type->name, iov_iter_count(iter), want_more);
+	_enter("{%s,%zu,%zu},%d",
+	       call->type->name, call->iov_len, iov_iter_count(iter), want_more);
 	ret = rxrpc_kernel_recv_data(net->socket, call->rxcall, iter,
-				     want_more, &remote_abort,
+				     &call->iov_len, want_more, &remote_abort,
 				     &call->service_id);
 	if (ret == 0 || ret == -EAGAIN)
 		return ret;

--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -11,6 +11,8 @@
 #include <linux/pagemap.h>
 #include <linux/writeback.h>
 #include <linux/pagevec.h>
+#include <linux/netfs.h>
+#include <linux/fscache.h>
 #include "internal.h"
 /*
@@ -22,55 +24,6 @@ int afs_set_page_dirty(struct page *page)
 	return __set_page_dirty_nobuffers(page);
 }
-/*
- * partly or wholly fill a page that's under preparation for writing
- */
-static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
-			 loff_t pos, unsigned int len, struct page *page)
-{
-	struct afs_read *req;
-	size_t p;
-	void *data;
-	int ret;
-	_enter(",,%llu", (unsigned long long)pos);
-	if (pos >= vnode->vfs_inode.i_size) {
-		p = pos & ~PAGE_MASK;
-		ASSERTCMP(p + len, <=, PAGE_SIZE);
-		data = kmap(page);
-		memset(data + p, 0, len);
-		kunmap(page);
-		return 0;
-	}
-	req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
-	if (!req)
-		return -ENOMEM;
-	refcount_set(&req->usage, 1);
-	req->pos = pos;
-	req->len = len;
-	req->nr_pages = 1;
-	req->pages = req->array;
-	req->pages[0] = page;
-	get_page(page);
-	ret = afs_fetch_data(vnode, key, req);
-	afs_put_read(req);
-	if (ret < 0) {
-		if (ret == -ENOENT) {
-			_debug("got NOENT from server"
-			       " - marking file deleted and stale");
-			set_bit(AFS_VNODE_DELETED, &vnode->flags);
-			ret = -ESTALE;
-		}
-	}
-	_leave(" = %d", ret);
-	return ret;
-}
 /*
 * prepare to perform part of a write to a page
 */
@@ -80,47 +33,40 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	struct page *page;
-	struct key *key = afs_file_key(file);
 	unsigned long priv;
-	unsigned f, from = pos & (PAGE_SIZE - 1);
+	unsigned f, from;
-	unsigned t, to = from + len;
+	unsigned t, to;
-	pgoff_t index = pos >> PAGE_SHIFT;
+	pgoff_t index;
 	int ret;
-	_enter("{%llx:%llu},{%lx},%u,%u",
+	_enter("{%llx:%llu},%llx,%x",
-	       vnode->fid.vid, vnode->fid.vnode, index, from, to);
+	       vnode->fid.vid, vnode->fid.vnode, pos, len);
-	page = grab_cache_page_write_begin(mapping, index, flags);
+	/* Prefetch area to be written into the cache if we're caching this
-	if (!page)
+	 * file.  We need to do this before we get a lock on the page in case
-		return -ENOMEM;
+	 * there's more than one writer competing for the same cache block.
+	 */
+	ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata,
+				&afs_req_ops, NULL);
+	if (ret < 0)
+		return ret;
-	if (!PageUptodate(page) && len != PAGE_SIZE) {
+	index = page->index;
-		ret = afs_fill_page(vnode, key, pos & PAGE_MASK, PAGE_SIZE, page);
+	from = pos - index * PAGE_SIZE;
-		if (ret < 0) {
+	to = from + len;
-			unlock_page(page);
-			put_page(page);
-			_leave(" = %d [prep]", ret);
-			return ret;
-		}
-		SetPageUptodate(page);
-	}
 try_again:
 	/* See if this page is already partially written in a way that we can
 	 * merge the new write with.
 	 */
-	t = f = 0;
 	if (PagePrivate(page)) {
 		priv = page_private(page);
-		f = afs_page_dirty_from(priv);
+		f = afs_page_dirty_from(page, priv);
-		t = afs_page_dirty_to(priv);
+		t = afs_page_dirty_to(page, priv);
 		ASSERTCMP(f, <=, t);
-	}
-	if (f != t) {
 		if (PageWriteback(page)) {
-			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"),
+			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page);
-					     page->index, priv);
 			goto flush_conflicting_write;
 		}
 		/* If the file is being filled locally, allow inter-write
@@ -164,12 +110,10 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		  struct page *page, void *fsdata)
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-	struct key *key = afs_file_key(file);
 	unsigned long priv;
-	unsigned int f, from = pos & (PAGE_SIZE - 1);
+	unsigned int f, from = pos & (thp_size(page) - 1);
 	unsigned int t, to = from + copied;
 	loff_t i_size, maybe_i_size;
-	int ret = 0;
 	_enter("{%llx:%llu},{%lx}",
 	       vnode->fid.vid, vnode->fid.vnode, page->index);
@@ -188,88 +132,75 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		write_sequnlock(&vnode->cb_lock);
 	}
-	if (!PageUptodate(page)) {
+	ASSERT(PageUptodate(page));
-		if (copied < len) {
-			/* Try and load any missing data from the server.  The
-			 * unmarshalling routine will take care of clearing any
-			 * bits that are beyond the EOF.
-			 */
-			ret = afs_fill_page(vnode, key, pos + copied,
-					    len - copied, page);
-			if (ret < 0)
-				goto out;
-		}
-		SetPageUptodate(page);
-	}
 	if (PagePrivate(page)) {
 		priv = page_private(page);
-		f = afs_page_dirty_from(priv);
+		f = afs_page_dirty_from(page, priv);
-		t = afs_page_dirty_to(priv);
+		t = afs_page_dirty_to(page, priv);
 		if (from < f)
 			f = from;
 		if (to > t)
 			t = to;
-		priv = afs_page_dirty(f, t);
+		priv = afs_page_dirty(page, f, t);
 		set_page_private(page, priv);
-		trace_afs_page_dirty(vnode, tracepoint_string("dirty+"),
+		trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page);
-				     page->index, priv);
 	} else {
-		priv = afs_page_dirty(from, to);
+		priv = afs_page_dirty(page, from, to);
 		attach_page_private(page, (void *)priv);
-		trace_afs_page_dirty(vnode, tracepoint_string("dirty"),
+		trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page);
-				     page->index, priv);
 	}
-	set_page_dirty(page);
+	if (set_page_dirty(page))
-	if (PageDirty(page))
+		_debug("dirtied %lx", page->index);
-		_debug("dirtied");
-	ret = copied;
 out:
 	unlock_page(page);
 	put_page(page);
-	return ret;
+	return copied;
 }
 /*
 * kill all the pages in the given range
 */
 static void afs_kill_pages(struct address_space *mapping,
-			   pgoff_t first, pgoff_t last)
+			   loff_t start, loff_t len)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct pagevec pv;
-	unsigned count, loop;
+	unsigned int loop, psize;
-	_enter("{%llx:%llu},%lx-%lx",
+	_enter("{%llx:%llu},%llx @%llx",
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
 	pagevec_init(&pv);
 	do {
-		_debug("kill %lx-%lx", first, last);
+		_debug("kill %llx @%llx", len, start);
-		count = last - first + 1;
+		pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE,
-		if (count > PAGEVEC_SIZE)
+					      PAGEVEC_SIZE, pv.pages);
-			count = PAGEVEC_SIZE;
+		if (pv.nr == 0)
-		pv.nr = find_get_pages_contig(mapping, first, count, pv.pages);
+			break;
-		ASSERTCMP(pv.nr, ==, count);
-		for (loop = 0; loop < count; loop++) {
+		for (loop = 0; loop < pv.nr; loop++) {
 			struct page *page = pv.pages[loop];
+			if (page->index * PAGE_SIZE >= start + len)
+				break;
+			psize = thp_size(page);
+			start += psize;
+			len -= psize;
 			ClearPageUptodate(page);
-			SetPageError(page);
 			end_page_writeback(page);
-			if (page->index >= first)
-				first = page->index + 1;
 			lock_page(page);
 			generic_error_remove_page(mapping, page);
 			unlock_page(page);
 		}
 		__pagevec_release(&pv);
-	} while (first <= last);
+	} while (len > 0);
 	_leave("");
 }
@@ -279,37 +210,40 @@ static void afs_kill_pages(struct address_space *mapping,
 */
 static void afs_redirty_pages(struct writeback_control *wbc,
 			      struct address_space *mapping,
-			      pgoff_t first, pgoff_t last)
+			      loff_t start, loff_t len)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct pagevec pv;
-	unsigned count, loop;
+	unsigned int loop, psize;
-	_enter("{%llx:%llu},%lx-%lx",
+	_enter("{%llx:%llu},%llx @%llx",
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
 	pagevec_init(&pv);
 	do {
-		_debug("redirty %lx-%lx", first, last);
+		_debug("redirty %llx @%llx", len, start);
-		count = last - first + 1;
+		pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE,
-		if (count > PAGEVEC_SIZE)
+					      PAGEVEC_SIZE, pv.pages);
-			count = PAGEVEC_SIZE;
+		if (pv.nr == 0)
-		pv.nr = find_get_pages_contig(mapping, first, count, pv.pages);
+			break;
-		ASSERTCMP(pv.nr, ==, count);
-		for (loop = 0; loop < count; loop++) {
+		for (loop = 0; loop < pv.nr; loop++) {
 			struct page *page = pv.pages[loop];
+			if (page->index * PAGE_SIZE >= start + len)
+				break;
+			psize = thp_size(page);
+			start += psize;
+			len -= psize;
 			redirty_page_for_writepage(wbc, page);
 			end_page_writeback(page);
-			if (page->index >= first)
-				first = page->index + 1;
 		}
 		__pagevec_release(&pv);
-	} while (first <= last);
+	} while (len > 0);
 	_leave("");
 }
@@ -317,37 +251,32 @@ static void afs_redirty_pages(struct writeback_control *wbc,
 /*
 * completion of write to server
 */
-static void afs_pages_written_back(struct afs_vnode *vnode,
+static void afs_pages_written_back(struct afs_vnode *vnode, loff_t start, unsigned int len)
-				   pgoff_t first, pgoff_t last)
 {
-	struct pagevec pv;
+	struct address_space *mapping = vnode->vfs_inode.i_mapping;
-	unsigned long priv;
+	struct page *page;
-	unsigned count, loop;
+	pgoff_t end;
-	_enter("{%llx:%llu},{%lx-%lx}",
+	XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
-	pagevec_init(&pv);
+	_enter("{%llx:%llu},{%x @%llx}",
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
-	do {
+	rcu_read_lock();
-		_debug("done %lx-%lx", first, last);
+	end = (start + len - 1) / PAGE_SIZE;
-		count = last - first + 1;
+	xas_for_each(&xas, page, end) {
-		if (count > PAGEVEC_SIZE)
+		if (!PageWriteback(page)) {
-			count = PAGEVEC_SIZE;
+			kdebug("bad %x @%llx page %lx %lx", len, start, page->index, end);
-		pv.nr = find_get_pages_contig(vnode->vfs_inode.i_mapping,
+			ASSERT(PageWriteback(page));
-					      first, count, pv.pages);
-		ASSERTCMP(pv.nr, ==, count);
-		for (loop = 0; loop < count; loop++) {
-			priv = (unsigned long)detach_page_private(pv.pages[loop]);
-			trace_afs_page_dirty(vnode, tracepoint_string("clear"),
-					     pv.pages[loop]->index, priv);
-			end_page_writeback(pv.pages[loop]);
 		}
-		first += count;
-		__pagevec_release(&pv);
+		trace_afs_page_dirty(vnode, tracepoint_string("clear"), page);
-	} while (first <= last);
+		detach_page_private(page);
+		page_endio(page, true, 0);
+	}
+	rcu_read_unlock();
 	afs_prune_wb_keys(vnode);
 	_leave("");
@@ -402,11 +331,9 @@ static void afs_store_data_success(struct afs_operation *op)
 	afs_vnode_commit_status(op, &op->file[0]);
 	if (op->error == 0) {
 		if (!op->store.laundering)
-			afs_pages_written_back(vnode, op->store.first, op->store.last);
+			afs_pages_written_back(vnode, op->store.pos, op->store.size);
 		afs_stat_v(vnode, n_stores);
-		atomic_long_add((op->store.last * PAGE_SIZE + op->store.last_to) -
+		atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes);
-				(op->store.first * PAGE_SIZE + op->store.first_offset),
-				&afs_v2net(vnode)->n_store_bytes);
 	}
 }
@@ -419,21 +346,20 @@ static const struct afs_operation_ops afs_store_data_operation = {
 /*
 * write to a file
 */
-static int afs_store_data(struct address_space *mapping,
+static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos,
-			  pgoff_t first, pgoff_t last,
+			  bool laundering)
-			  unsigned offset, unsigned to, bool laundering)
 {
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct afs_operation *op;
 	struct afs_wb_key *wbk = NULL;
-	int ret;
+	loff_t size = iov_iter_count(iter), i_size;
+	int ret = -ENOKEY;
-	_enter("%s{%llx:%llu.%u},%lx,%lx,%x,%x",
+	_enter("%s{%llx:%llu.%u},%llx,%llx",
 	       vnode->volume->name,
 	       vnode->fid.vid,
 	       vnode->fid.vnode,
 	       vnode->fid.unique,
-	       first, last, offset, to);
+	       size, pos);
 	ret = afs_get_writeback_key(vnode, &wbk);
 	if (ret) {
@@ -447,13 +373,14 @@ static int afs_store_data(struct address_space *mapping,
 		return -ENOMEM;
 	}
+	i_size = i_size_read(&vnode->vfs_inode);
 	afs_op_set_vnode(op, 0, vnode);
 	op->file[0].dv_delta = 1;
-	op->store.mapping = mapping;
+	op->store.write_iter = iter;
-	op->store.first = first;
+	op->store.pos = pos;
-	op->store.last = last;
+	op->store.size = size;
-	op->store.first_offset = offset;
+	op->store.i_size = max(pos + size, i_size);
-	op->store.last_to = to;
 	op->store.laundering = laundering;
 	op->mtime = vnode->vfs_inode.i_mtime;
 	op->flags |= AFS_OPERATION_UNINTR;
@@ -487,73 +414,58 @@ static int afs_store_data(struct address_space *mapping,
 }
 /*
- * Synchronously write back the locked page and any subsequent non-locked dirty
+ * Extend the region to be written back to include subsequent contiguously
- * pages.
+ * dirty pages if possible, but don't sleep while doing so.
+ *
+ * If this page holds new content, then we can include filler zeros in the
+ * writeback.
 */
-static int afs_write_back_from_locked_page(struct address_space *mapping,
+static void afs_extend_writeback(struct address_space *mapping,
-					   struct writeback_control *wbc,
+				 struct afs_vnode *vnode,
-					   struct page *primary_page,
+				 long *_count,
-					   pgoff_t final_page)
+				 loff_t start,
+				 loff_t max_len,
+				 bool new_content,
+				 unsigned int *_len)
 {
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct pagevec pvec;
-	struct page *pages[8], *page;
+	struct page *page;
-	unsigned long count, priv;
+	unsigned long priv;
-	unsigned n, offset, to, f, t;
+	unsigned int psize, filler = 0;
-	pgoff_t start, first, last;
+	unsigned int f, t;
-	loff_t i_size, end;
+	loff_t len = *_len;
-	int loop, ret;
+	pgoff_t index = (start + len) / PAGE_SIZE;
+	bool stop = true;
-	_enter(",%lx", primary_page->index);
+	unsigned int i;
-	count = 1;
+	XA_STATE(xas, &mapping->i_pages, index);
-	if (test_set_page_writeback(primary_page))
+	pagevec_init(&pvec);
-		BUG();
-	/* Find all consecutive lockable dirty pages that have contiguous
-	 * written regions, stopping when we find a page that is not
-	 * immediately lockable, is not dirty or is missing, or we reach the
-	 * end of the range.
-	 */
-	start = primary_page->index;
-	priv = page_private(primary_page);
-	offset = afs_page_dirty_from(priv);
-	to = afs_page_dirty_to(priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("store"),
-			     primary_page->index, priv);
-	WARN_ON(offset == to);
-	if (offset == to)
-		trace_afs_page_dirty(vnode, tracepoint_string("WARN"),
-				     primary_page->index, priv);
-	if (start >= final_page ||
-	    (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)))
-		goto no_more;
-	start++;
 	do {
-		_debug("more %lx [%lx]", start, count);
+		/* Firstly, we gather up a batch of contiguous dirty pages
-		n = final_page - start + 1;
+		 * under the RCU read lock - but we can't clear the dirty flags
-		if (n > ARRAY_SIZE(pages))
+		 * there if any of those pages are mapped.
-			n = ARRAY_SIZE(pages);
+		 */
-		n = find_get_pages_contig(mapping, start, ARRAY_SIZE(pages), pages);
+		rcu_read_lock();
-		_debug("fgpc %u", n);
-		if (n == 0)
-			goto no_more;
-		if (pages[0]->index != start) {
-			do {
-				put_page(pages[--n]);
-			} while (n > 0);
-			goto no_more;
-		}
-		for (loop = 0; loop < n; loop++) {
+		xas_for_each(&xas, page, ULONG_MAX) {
-			page = pages[loop];
+			stop = true;
-			if (to != PAGE_SIZE &&
+			if (xas_retry(&xas, page))
-			    !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags))
+				continue;
+			if (xa_is_value(page))
 				break;
-			if (page->index > final_page)
+			if (page->index != index)
 				break;
+			if (!page_cache_get_speculative(page)) {
+				xas_reset(&xas);
+				continue;
+			}
+			/* Has the page moved or been split? */
+			if (unlikely(page != xas_reload(&xas)))
+				break;
 			if (!trylock_page(page))
 				break;
 			if (!PageDirty(page) || PageWriteback(page)) {
@@ -561,57 +473,134 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 				break;
 			}
+			psize = thp_size(page);
 			priv = page_private(page);
-			f = afs_page_dirty_from(priv);
+			f = afs_page_dirty_from(page, priv);
-			t = afs_page_dirty_to(priv);
+			t = afs_page_dirty_to(page, priv);
-			if (f != 0 &&
+			if (f != 0 && !new_content) {
-			    !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) {
 				unlock_page(page);
 				break;
 			}
-			to = t;
-			trace_afs_page_dirty(vnode, tracepoint_string("store+"),
+			len += filler + t;
-					     page->index, priv);
+			filler = psize - t;
+			if (len >= max_len || *_count <= 0)
+				stop = true;
+			else if (t == psize || new_content)
+				stop = false;
+			index += thp_nr_pages(page);
+			if (!pagevec_add(&pvec, page))
+				break;
+			if (stop)
+				break;
+		}
+		if (!stop)
+			xas_pause(&xas);
+		rcu_read_unlock();
+		/* Now, if we obtained any pages, we can shift them to being
+		 * writable and mark them for caching.
+		 */
+		if (!pagevec_count(&pvec))
+			break;
+		for (i = 0; i < pagevec_count(&pvec); i++) {
+			page = pvec.pages[i];
+			trace_afs_page_dirty(vnode, tracepoint_string("store+"), page);
 			if (!clear_page_dirty_for_io(page))
 				BUG();
 			if (test_set_page_writeback(page))
 				BUG();
+			*_count -= thp_nr_pages(page);
 			unlock_page(page);
-			put_page(page);
-		}
-		count += loop;
-		if (loop < n) {
-			for (; loop < n; loop++)
-				put_page(pages[loop]);
-			goto no_more;
 		}
-		start += loop;
+		pagevec_release(&pvec);
-	} while (start <= final_page && count < 65536);
+		cond_resched();
+	} while (!stop);
+	*_len = len;
+}
+/*
+ * Synchronously write back the locked page and any subsequent non-locked dirty
+ * pages.
+ */
+static ssize_t afs_write_back_from_locked_page(struct address_space *mapping,
+					       struct writeback_control *wbc,
+					       struct page *page,
+					       loff_t start, loff_t end)
+{
+	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct iov_iter iter;
+	unsigned long priv;
+	unsigned int offset, to, len, max_len;
+	loff_t i_size = i_size_read(&vnode->vfs_inode);
+	bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
+	long count = wbc->nr_to_write;
+	int ret;
+	_enter(",%lx,%llx-%llx", page->index, start, end);
+	if (test_set_page_writeback(page))
+		BUG();
+	count -= thp_nr_pages(page);
+	/* Find all consecutive lockable dirty pages that have contiguous
+	 * written regions, stopping when we find a page that is not
+	 * immediately lockable, is not dirty or is missing, or we reach the
+	 * end of the range.
+	 */
+	priv = page_private(page);
+	offset = afs_page_dirty_from(page, priv);
+	to = afs_page_dirty_to(page, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("store"), page);
+	len = to - offset;
+	start += offset;
+	if (start < i_size) {
+		/* Trim the write to the EOF; the extra data is ignored.  Also
+		 * put an upper limit on the size of a single storedata op.
+		 */
+		max_len = 65536 * 4096;
+		max_len = min_t(unsigned long long, max_len, end - start + 1);
+		max_len = min_t(unsigned long long, max_len, i_size - start);
+		if (len < max_len &&
+		    (to == thp_size(page) || new_content))
+			afs_extend_writeback(mapping, vnode, &count,
+					     start, max_len, new_content, &len);
+		len = min_t(loff_t, len, max_len);
+	}
-no_more:
 	/* We now have a contiguous set of dirty pages, each with writeback
 	 * set; the first page is still locked at this point, but all the rest
 	 * have been unlocked.
 	 */
-	unlock_page(primary_page);
+	unlock_page(page);
-	first = primary_page->index;
+	if (start < i_size) {
-	last = first + count - 1;
+		_debug("write back %x @%llx [%llx]", len, start, i_size);
-	end = (loff_t)last * PAGE_SIZE + to;
+		iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
-	i_size = i_size_read(&vnode->vfs_inode);
+		ret = afs_store_data(vnode, &iter, start, false);
+	} else {
+		_debug("write discard %x @%llx [%llx]", len, start, i_size);
-	_debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to);
+		/* The dirty region was entirely beyond the EOF. */
-	if (end > i_size)
+		afs_pages_written_back(vnode, start, len);
-		to = i_size & ~PAGE_MASK;
+		ret = 0;
+	}
-	ret = afs_store_data(mapping, first, last, offset, to, false);
 	switch (ret) {
 	case 0:
-		ret = count;
+		wbc->nr_to_write = count;
+		ret = len;
 		break;
 	default:
@@ -623,13 +612,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	case -EKEYEXPIRED:
 	case -EKEYREJECTED:
 	case -EKEYREVOKED:
-		afs_redirty_pages(wbc, mapping, first, last);
+		afs_redirty_pages(wbc, mapping, start, len);
 		mapping_set_error(mapping, ret);
 		break;
 	case -EDQUOT:
 	case -ENOSPC:
-		afs_redirty_pages(wbc, mapping, first, last);
+		afs_redirty_pages(wbc, mapping, start, len);
 		mapping_set_error(mapping, -ENOSPC);
 		break;
@@ -641,7 +630,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	case -ENOMEDIUM:
 	case -ENXIO:
 		trace_afs_file_error(vnode, ret, afs_file_error_writeback_fail);
-		afs_kill_pages(mapping, first, last);
+		afs_kill_pages(mapping, start, len);
 		mapping_set_error(mapping, ret);
 		break;
 	}
@@ -656,19 +645,19 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 */
 int afs_writepage(struct page *page, struct writeback_control *wbc)
 {
-	int ret;
+	ssize_t ret;
+	loff_t start;
 	_enter("{%lx},", page->index);
+	start = page->index * PAGE_SIZE;
 	ret = afs_write_back_from_locked_page(page->mapping, wbc, page,
-					      wbc->range_end >> PAGE_SHIFT);
+					      start, LLONG_MAX - start);
 	if (ret < 0) {
-		_leave(" = %d", ret);
+		_leave(" = %zd", ret);
-		return 0;
+		return ret;
 	}
-	wbc->nr_to_write -= ret;
 	_leave(" = 0");
 	return 0;
 }
@@ -678,35 +667,46 @@ int afs_writepage(struct page *page, struct writeback_control *wbc)
 */
 static int afs_writepages_region(struct address_space *mapping,
 				 struct writeback_control *wbc,
-				 pgoff_t index, pgoff_t end, pgoff_t *_next)
+				 loff_t start, loff_t end, loff_t *_next)
 {
 	struct page *page;
-	int ret, n;
+	ssize_t ret;
+	int n;
-	_enter(",,%lx,%lx,", index, end);
+	_enter("%llx,%llx,", start, end);
 	do {
-		n = find_get_pages_range_tag(mapping, &index, end,
+		pgoff_t index = start / PAGE_SIZE;
-					PAGECACHE_TAG_DIRTY, 1, &page);
+		n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
+					     PAGECACHE_TAG_DIRTY, 1, &page);
 		if (!n)
 			break;
+		start = (loff_t)page->index * PAGE_SIZE; /* May regress with THPs */
 		_debug("wback %lx", page->index);
-		/*
+		/* At this point we hold neither the i_pages lock nor the
-		 * at this point we hold neither the i_pages lock nor the
 		 * page lock: the page may be truncated or invalidated
 		 * (changing page->mapping to NULL), or even swizzled
 		 * back from swapper_space to tmpfs file mapping
 		 */
-		ret = lock_page_killable(page);
+		if (wbc->sync_mode != WB_SYNC_NONE) {
-		if (ret < 0) {
+			ret = lock_page_killable(page);
-			put_page(page);
+			if (ret < 0) {
-			_leave(" = %d", ret);
+				put_page(page);
-			return ret;
+				return ret;
+			}
+		} else {
+			if (!trylock_page(page)) {
+				put_page(page);
+				return 0;
+			}
 		}
 		if (page->mapping != mapping || !PageDirty(page)) {
+			start += thp_size(page);
 			unlock_page(page);
 			put_page(page);
 			continue;
@@ -722,20 +722,20 @@ static int afs_writepages_region(struct address_space *mapping,
 		if (!clear_page_dirty_for_io(page))
 			BUG();
-		ret = afs_write_back_from_locked_page(mapping, wbc, page, end);
+		ret = afs_write_back_from_locked_page(mapping, wbc, page, start, end);
 		put_page(page);
 		if (ret < 0) {
-			_leave(" = %d", ret);
+			_leave(" = %zd", ret);
 			return ret;
 		}
-		wbc->nr_to_write -= ret;
+		start += ret * PAGE_SIZE;
 		cond_resched();
-	} while (index < end && wbc->nr_to_write > 0);
+	} while (wbc->nr_to_write > 0);
-	*_next = index;
+	*_next = start;
-	_leave(" = 0 [%lx]", *_next);
+	_leave(" = 0 [%llx]", *_next);
 	return 0;
 }
@@ -746,7 +746,7 @@ int afs_writepages(struct address_space *mapping,
 		   struct writeback_control *wbc)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-	pgoff_t start, end, next;
+	loff_t start, next;
 	int ret;
 	_enter("");
@@ -761,22 +761,19 @@ int afs_writepages(struct address_space *mapping,
 		return 0;
 	if (wbc->range_cyclic) {
-		start = mapping->writeback_index;
+		start = mapping->writeback_index * PAGE_SIZE;
-		end = -1;
+		ret = afs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
-		ret = afs_writepages_region(mapping, wbc, start, end, &next);
 		if (start > 0 && wbc->nr_to_write > 0 && ret == 0)
 			ret = afs_writepages_region(mapping, wbc, 0, start,
 						    &next);
-		mapping->writeback_index = next;
+		mapping->writeback_index = next / PAGE_SIZE;
 	} else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
-		end = (pgoff_t)(LLONG_MAX >> PAGE_SHIFT);
+		ret = afs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
-		ret = afs_writepages_region(mapping, wbc, 0, end, &next);
 		if (wbc->nr_to_write > 0)
 			mapping->writeback_index = next;
 	} else {
-		start = wbc->range_start >> PAGE_SHIFT;
+		ret = afs_writepages_region(mapping, wbc,
-		end = wbc->range_end >> PAGE_SHIFT;
+					    wbc->range_start, wbc->range_end, &next);
-		ret = afs_writepages_region(mapping, wbc, start, end, &next);
 	}
 	up_read(&vnode->validate_lock);
@@ -834,13 +831,13 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 */
 vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 {
+	struct page *page = thp_head(vmf->page);
 	struct file *file = vmf->vma->vm_file;
 	struct inode *inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	unsigned long priv;
-	_enter("{{%llx:%llu}},{%lx}",
+	_enter("{{%llx:%llu}},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index);
-	       vnode->fid.vid, vnode->fid.vnode, vmf->page->index);
 	sb_start_pagefault(inode->i_sb);
@@ -848,29 +845,35 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_wait_on_page_write(vnode->cache, vmf->page);
+	if (PageFsCache(page) &&
+	    wait_on_page_fscache_killable(page) < 0)
+		return VM_FAULT_RETRY;
 #endif
-	if (wait_on_page_writeback_killable(vmf->page))
+	if (wait_on_page_writeback_killable(page))
 		return VM_FAULT_RETRY;
-	if (lock_page_killable(vmf->page) < 0)
+	if (lock_page_killable(page) < 0)
 		return VM_FAULT_RETRY;
 	/* We mustn't change page->private until writeback is complete as that
 	 * details the portion of the page we need to write back and we might
 	 * need to redirty the page if there's a problem.
 	 */
-	wait_on_page_writeback(vmf->page);
+	if (wait_on_page_writeback_killable(page) < 0) {
+		unlock_page(page);
+		return VM_FAULT_RETRY;
+	}
-	priv = afs_page_dirty(0, PAGE_SIZE);
+	priv = afs_page_dirty(page, 0, thp_size(page));
 	priv = afs_page_dirty_mmapped(priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"),
+	if (PagePrivate(page)) {
-			     vmf->page->index, priv);
+		set_page_private(page, priv);
-	if (PagePrivate(vmf->page))
+		trace_afs_page_dirty(vnode, tracepoint_string("mkwrite+"), page);
-		set_page_private(vmf->page, priv);
+	} else {
-	else
+		attach_page_private(page, (void *)priv);
-		attach_page_private(vmf->page, (void *)priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), page);
+	}
 	file_update_time(file);
 	sb_end_pagefault(inode->i_sb);
@@ -912,6 +915,8 @@ int afs_launder_page(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct iov_iter iter;
+	struct bio_vec bv[1];
 	unsigned long priv;
 	unsigned int f, t;
 	int ret = 0;
@@ -921,26 +926,24 @@ int afs_launder_page(struct page *page)
 	priv = page_private(page);
 	if (clear_page_dirty_for_io(page)) {
 		f = 0;
-		t = PAGE_SIZE;
+		t = thp_size(page);
 		if (PagePrivate(page)) {
-			f = afs_page_dirty_from(priv);
+			f = afs_page_dirty_from(page, priv);
-			t = afs_page_dirty_to(priv);
+			t = afs_page_dirty_to(page, priv);
 		}
-		trace_afs_page_dirty(vnode, tracepoint_string("launder"),
+		bv[0].bv_page = page;
-				     page->index, priv);
+		bv[0].bv_offset = f;
-		ret = afs_store_data(mapping, page->index, page->index, t, f, true);
+		bv[0].bv_len = t - f;
-	}
+		iov_iter_bvec(&iter, WRITE, bv, 1, bv[0].bv_len);
-	priv = (unsigned long)detach_page_private(page);
-	trace_afs_page_dirty(vnode, tracepoint_string("laundered"),
-			     page->index, priv);
-#ifdef CONFIG_AFS_FSCACHE
+		trace_afs_page_dirty(vnode, tracepoint_string("launder"), page);
-	if (PageFsCache(page)) {
+		ret = afs_store_data(vnode, &iter, (loff_t)page->index * PAGE_SIZE,
-		fscache_wait_on_page_write(vnode->cache, page);
+				     true);
-		fscache_uncache_page(vnode->cache, page);
 	}
-#endif
+	trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
+	detach_page_private(page);
+	wait_on_page_fscache(page);
 	return ret;
 }
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -360,22 +360,23 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_read *req = op->fetch.req;
 	const __be32 *bp;
-	unsigned int size;
 	int ret;
-	_enter("{%u,%zu/%llu}",
+	_enter("{%u,%zu, %zu/%llu}",
-	       call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+	       call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+	       req->actual_len);
 	switch (call->unmarshall) {
 	case 0:
 		req->actual_len = 0;
-		req->index = 0;
-		req->offset = req->pos & (PAGE_SIZE - 1);
 		afs_extract_to_tmp64(call);
 		call->unmarshall++;
 		fallthrough;
-		/* extract the returned data length */
+		/* Extract the returned data length into ->actual_len.  This
+		 * may indicate more or less data than was requested will be
+		 * returned.
+		 */
 	case 1:
 		_debug("extract data length");
 		ret = afs_extract_data(call, true);
@@ -384,44 +385,25 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 		req->actual_len = be64_to_cpu(call->tmp64);
 		_debug("DATA length: %llu", req->actual_len);
-		req->remain = min(req->len, req->actual_len);
-		if (req->remain == 0)
+		if (req->actual_len == 0)
 			goto no_more_data;
+		call->iter = req->iter;
+		call->iov_len = min(req->actual_len, req->len);
 		call->unmarshall++;
-	begin_page:
-		ASSERTCMP(req->index, <, req->nr_pages);
-		if (req->remain > PAGE_SIZE - req->offset)
-			size = PAGE_SIZE - req->offset;
-		else
-			size = req->remain;
-		call->bvec[0].bv_len = size;
-		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
-		iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size);
-		ASSERTCMP(size, <=, PAGE_SIZE);
 		fallthrough;
 		/* extract the returned data */
 	case 2:
 		_debug("extract data %zu/%llu",
-		       iov_iter_count(call->iter), req->remain);
+		       iov_iter_count(call->iter), req->actual_len);
 		ret = afs_extract_data(call, true);
 		if (ret < 0)
 			return ret;
-		req->remain -= call->bvec[0].bv_len;
-		req->offset += call->bvec[0].bv_len;
-		ASSERTCMP(req->offset, <=, PAGE_SIZE);
-		if (req->offset == PAGE_SIZE) {
-			req->offset = 0;
-			req->index++;
-			if (req->remain > 0)
-				goto begin_page;
-		}
-		ASSERTCMP(req->remain, ==, 0);
+		call->iter = &call->def_iter;
 		if (req->actual_len <= req->len)
 			goto no_more_data;
@@ -467,17 +449,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 		break;
 	}
-	for (; req->index < req->nr_pages; req->index++) {
-		if (req->offset < PAGE_SIZE)
-			zero_user_segment(req->pages[req->index],
-					  req->offset, PAGE_SIZE);
-		req->offset = 0;
-	}
-	if (req->page_done)
-		for (req->index = 0; req->index < req->nr_pages; req->index++)
-			req->page_done(req);
 	_leave(" = 0 [done]");
 	return 0;
 }
@@ -516,6 +487,8 @@ void yfs_fs_fetch_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
+	req->call_debug_id = call->debug_id;
 	/* marshall the parameters */
 	bp = call->request;
 	bp = xdr_encode_u32(bp, YFSFETCHDATA64);
@@ -1102,25 +1075,15 @@ void yfs_fs_store_data(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
-	loff_t size, pos, i_size;
 	__be32 *bp;
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
-	size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-	if (op->store.first != op->store.last)
-		size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT;
-	pos = (loff_t)op->store.first << PAGE_SHIFT;
-	pos += op->store.first_offset;
-	i_size = i_size_read(&vp->vnode->vfs_inode);
-	if (pos + size > i_size)
-		i_size = size + pos;
 	_debug("size %llx, at %llx, i_size %llx",
-	       (unsigned long long)size, (unsigned long long)pos,
+	       (unsigned long long)op->store.size,
-	       (unsigned long long)i_size);
+	       (unsigned long long)op->store.pos,
+	       (unsigned long long)op->store.i_size);
 	call = afs_alloc_flat_call(op->net, &yfs_RXYFSStoreData64,
 				   sizeof(__be32) +
@@ -1133,8 +1096,7 @@ void yfs_fs_store_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
-	call->key = op->key;
+	call->write_iter = op->store.write_iter;
-	call->send_pages = true;
 	/* marshall the parameters */
 	bp = call->request;
@@ -1142,9 +1104,9 @@ void yfs_fs_store_data(struct afs_operation *op)
 	bp = xdr_encode_u32(bp, 0); /* RPC flags */
 	bp = xdr_encode_YFSFid(bp, &vp->fid);
 	bp = xdr_encode_YFSStoreStatus_mtime(bp, &op->mtime);
-	bp = xdr_encode_u64(bp, pos);
+	bp = xdr_encode_u64(bp, op->store.pos);
-	bp = xdr_encode_u64(bp, size);
+	bp = xdr_encode_u64(bp, op->store.size);
-	bp = xdr_encode_u64(bp, i_size);
+	bp = xdr_encode_u64(bp, op->store.i_size);
 	yfs_check_req(call, bp);
 	trace_afs_make_fs_call(call, &vp->fid);

--- a/include/net/af_rxrpc.h
+++ b/include/net/af_rxrpc.h
@@ -53,7 +53,7 @@ int rxrpc_kernel_send_data(struct socket *, struct rxrpc_call *,
 			   struct msghdr *, size_t,
 			   rxrpc_notify_end_tx_t);
 int rxrpc_kernel_recv_data(struct socket *, struct rxrpc_call *,
-			   struct iov_iter *, bool, u32 *, u16 *);
+			   struct iov_iter *, size_t *, bool, u32 *, u16 *);
 bool rxrpc_kernel_abort_call(struct socket *, struct rxrpc_call *,
 			     u32, int, const char *);
 void rxrpc_kernel_end_call(struct socket *, struct rxrpc_call *);

--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -886,65 +886,52 @@ TRACE_EVENT(afs_call_done,
 		      __entry->rx_call)
 	    );
-TRACE_EVENT(afs_send_pages,
+TRACE_EVENT(afs_send_data,
-	    TP_PROTO(struct afs_call *call, struct msghdr *msg,
+	    TP_PROTO(struct afs_call *call, struct msghdr *msg),
-		     pgoff_t first, pgoff_t last, unsigned int offset),
-	    TP_ARGS(call, msg, first, last, offset),
+	    TP_ARGS(call, msg),
 	    TP_STRUCT__entry(
 		    __field(unsigned int,		call		)
-		    __field(pgoff_t,			first		)
-		    __field(pgoff_t,			last		)
-		    __field(unsigned int,		nr		)
-		    __field(unsigned int,		bytes		)
-		    __field(unsigned int,		offset		)
 		    __field(unsigned int,		flags		)
+		    __field(loff_t,			offset		)
+		    __field(loff_t,			count		)
 			     ),
 	    TP_fast_assign(
 		    __entry->call = call->debug_id;
-		    __entry->first = first;
-		    __entry->last = last;
-		    __entry->nr = msg->msg_iter.nr_segs;
-		    __entry->bytes = msg->msg_iter.count;
-		    __entry->offset = offset;
 		    __entry->flags = msg->msg_flags;
+		    __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset;
+		    __entry->count = iov_iter_count(&msg->msg_iter);
 			   ),
-	    TP_printk(" c=%08x %lx-%lx-%lx b=%x o=%x f=%x",
+	    TP_printk(" c=%08x o=%llx n=%llx f=%x",
-		      __entry->call,
+		      __entry->call, __entry->offset, __entry->count,
-		      __entry->first, __entry->first + __entry->nr - 1, __entry->last,
-		      __entry->bytes, __entry->offset,
 		      __entry->flags)
 	    );
-TRACE_EVENT(afs_sent_pages,
+TRACE_EVENT(afs_sent_data,
-	    TP_PROTO(struct afs_call *call, pgoff_t first, pgoff_t last,
+	    TP_PROTO(struct afs_call *call, struct msghdr *msg, int ret),
-		     pgoff_t cursor, int ret),
-	    TP_ARGS(call, first, last, cursor, ret),
+	    TP_ARGS(call, msg, ret),
 	    TP_STRUCT__entry(
 		    __field(unsigned int,		call		)
-		    __field(pgoff_t,			first		)
-		    __field(pgoff_t,			last		)
-		    __field(pgoff_t,			cursor		)
 		    __field(int,			ret		)
+		    __field(loff_t,			offset		)
+		    __field(loff_t,			count		)
 			     ),
 	    TP_fast_assign(
 		    __entry->call = call->debug_id;
-		    __entry->first = first;
-		    __entry->last = last;
-		    __entry->cursor = cursor;
 		    __entry->ret = ret;
+		    __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset;
+		    __entry->count = iov_iter_count(&msg->msg_iter);
 			   ),
-	    TP_printk(" c=%08x %lx-%lx c=%lx r=%d",
+	    TP_printk(" c=%08x o=%llx n=%llx r=%x",
-		      __entry->call,
+		      __entry->call, __entry->offset, __entry->count,
-		      __entry->first, __entry->last,
+		      __entry->ret)
-		      __entry->cursor, __entry->ret)
 	    );
 TRACE_EVENT(afs_dir_check_failed,
@@ -969,30 +956,33 @@ TRACE_EVENT(afs_dir_check_failed,
 	    );
 TRACE_EVENT(afs_page_dirty,
-	    TP_PROTO(struct afs_vnode *vnode, const char *where,
+	    TP_PROTO(struct afs_vnode *vnode, const char *where, struct page *page),
-		     pgoff_t page, unsigned long priv),
-	    TP_ARGS(vnode, where, page, priv),
+	    TP_ARGS(vnode, where, page),
 	    TP_STRUCT__entry(
 		    __field(struct afs_vnode *,		vnode		)
 		    __field(const char *,		where		)
 		    __field(pgoff_t,			page		)
-		    __field(unsigned long,		priv		)
+		    __field(unsigned long,		from		)
+		    __field(unsigned long,		to		)
 			     ),
 	    TP_fast_assign(
 		    __entry->vnode = vnode;
 		    __entry->where = where;
-		    __entry->page = page;
+		    __entry->page = page->index;
-		    __entry->priv = priv;
+		    __entry->from = afs_page_dirty_from(page, page->private);
+		    __entry->to = afs_page_dirty_to(page, page->private);
+		    __entry->to |= (afs_is_page_dirty_mmapped(page->private) ?
+				    (1UL << (BITS_PER_LONG - 1)) : 0);
 			   ),
-	    TP_printk("vn=%p %lx %s %zx-%zx%s",
+	    TP_printk("vn=%p %lx %s %lx-%lx%s",
 		      __entry->vnode, __entry->page, __entry->where,
-		      afs_page_dirty_from(__entry->priv),
+		      __entry->from,
-		      afs_page_dirty_to(__entry->priv),
+		      __entry->to & ~(1UL << (BITS_PER_LONG - 1)),
-		      afs_is_page_dirty_mmapped(__entry->priv) ? " M" : "")
+		      __entry->to & (1UL << (BITS_PER_LONG - 1)) ? " M" : "")
 	    );
 TRACE_EVENT(afs_call_state,

--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -669,6 +669,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 * @sock: The socket that the call exists on
 * @call: The call to send data through
 * @iter: The buffer to receive into
+ * @_len: The amount of data we want to receive (decreased on return)
 * @want_more: True if more data is expected to be read
 * @_abort: Where the abort code is stored if -ECONNABORTED is returned
 * @_service: Where to store the actual service ID (may be upgraded)
@@ -684,7 +685,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 * *_abort should also be initialised to 0.
 */
 int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
-			   struct iov_iter *iter,
+			   struct iov_iter *iter, size_t *_len,
 			   bool want_more, u32 *_abort, u16 *_service)
 {
 	size_t offset = 0;
@@ -692,7 +693,7 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
 	_enter("{%d,%s},%zu,%d",
 	       call->debug_id, rxrpc_call_states[call->state],
-	       iov_iter_count(iter), want_more);
+	       *_len, want_more);
 	ASSERTCMP(call->state, !=, RXRPC_CALL_SERVER_SECURING);
@@ -703,8 +704,8 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
 	case RXRPC_CALL_SERVER_RECV_REQUEST:
 	case RXRPC_CALL_SERVER_ACK_REQUEST:
 		ret = rxrpc_recvmsg_data(sock, call, NULL, iter,
-					 iov_iter_count(iter), 0,
+					 *_len, 0, &offset);
-					 &offset);
+		*_len -= offset;
 		if (ret < 0)
 			goto out;