1. 28 Dec, 2023 26 commits
    • David Howells's avatar
      9p: Use netfslib read/write_iter · 80105ed2
      David Howells authored
      Use netfslib's read and write iteration helpers, allowing netfslib to take
      over the management of the page cache for 9p files and to manage local disk
      caching.  In particular, this eliminates write_begin, write_end, writepage
      and all mentions of struct page and struct folio from 9p.
      
      Note that netfslib now offers the possibility of write-through caching if
      that is desirable for 9p: just set the NETFS_ICTX_WRITETHROUGH flag in
      v9inode->netfs.flags in v9fs_set_netfs_context().
      
      Note also this is untested as I can't get ganesha.nfsd to correctly parse
      the config to turn on 9p support.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: Eric Van Hensbergen <ericvh@kernel.org>
      cc: Latchesar Ionkov <lucho@ionkov.net>
      cc: Dominique Martinet <asmadeus@codewreck.org>
      cc: Christian Schoenebeck <linux_oss@crudebyte.com>
      cc: v9fs@lists.linux.dev
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      80105ed2
    • David Howells's avatar
      afs: Use the netfs write helpers · 3560358a
      David Howells authored
      Make afs use the netfs write helpers.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      3560358a
    • David Howells's avatar
      netfs: Export the netfs_sreq tracepoint · 545b135b
      David Howells authored
      Export the netfs_sreq tracepoint so that it can be called directly from
      client filesystems/cache backend modules.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      545b135b
    • David Howells's avatar
      netfs: Optimise away reads above the point at which there can be no data · 100ccd18
      David Howells authored
      Track the file position above which the server is not expected to have any
      data (the "zero point") and preemptively assume that we can satisfy
      requests by filling them with zeroes locally rather than attempting to
      download them if they're over that line - even if we've written data back
      to the server.  Assume that any data that was written back above that
      position is held in the local cache.  Note that we have to split requests
      that straddle the line.
      
      Make use of this to optimise away some reads from the server.  We need to
      set the zero point in the following circumstances:
      
       (1) When we see an extant remote inode and have no cache for it, we set
           the zero_point to i_size.
      
       (2) On local inode creation, we set zero_point to 0.
      
       (3) On local truncation down, we reduce zero_point to the new i_size if
           the new i_size is lower.
      
       (4) On local truncation up, we don't change zero_point.
      
       (5) On local modification, we don't change zero_point.
      
       (6) On remote invalidation, we set zero_point to the new i_size.
      
       (7) If stored data is discarded from the pagecache or culled from fscache,
           we must set zero_point above that if the data also got written to the
           server.
      
       (8) If dirty data is written back to the server, but not fscache, we must
           set zero_point above that.
      
       (9) If a direct I/O write is made, set zero_point above that.
      
      Assuming the above, any read from the server at or above the zero_point
      position will return all zeroes.
      
      The zero_point value can be stored in the cache, provided the above rules
      are applied to it by any code that culls part of the local cache.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      100ccd18
    • David Howells's avatar
      netfs: Implement a write-through caching option · 41d8e767
      David Howells authored
      Provide a flag whereby a filesystem may request that cifs_perform_write()
      perform write-through caching.  This involves putting pages directly into
      writeback rather than dirty and attaching them to a write operation as we
      go.
      
      Further, the writes being made are limited to the byte range being written
      rather than whole folios being written.  This can be used by cifs, for
      example, to deal with strict byte-range locking.
      
      This can't be used with content encryption as that may require expansion of
      the write RPC beyond the write being made.
      
      This doesn't affect writes via mmap - those are written back in the normal
      way; similarly failed writethrough writes are marked dirty and left to
      writeback to retry.  Another option would be to simply invalidate them, but
      the contents can be simultaneously accessed by read() and through mmap.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      41d8e767
    • David Howells's avatar
      netfs: Provide a launder_folio implementation · 4a79616c
      David Howells authored
      Provide a launder_folio implementation for netfslib.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      4a79616c
    • David Howells's avatar
      netfs: Provide a writepages implementation · 62c3b748
      David Howells authored
      Provide an implementation of writepages for network filesystems to delegate
      to.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      62c3b748
    • David Howells's avatar
      netfs, cachefiles: Pass upper bound length to allow expansion · e0ace6ca
      David Howells authored
      Make netfslib pass the maximum length to the ->prepare_write() op to tell
      the cache how much it can expand the length of a write to.  This allows a
      write to the server at the end of a file to be limited to a few bytes
      whilst writing an entire block to the cache (something required by direct
      I/O).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      e0ace6ca
    • David Howells's avatar
      netfs: Provide netfs_file_read_iter() · 80645bd4
      David Howells authored
      Provide a top-level-ish function that can be pointed to directly by
      ->read_iter file op.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      80645bd4
    • David Howells's avatar
      netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite() · 102a7e2c
      David Howells authored
      Provide an entry point to delegate a filesystem's ->page_mkwrite() to.
      This checks for conflicting writes, then attached any netfs-specific group
      marking (e.g. ceph snap) to the page to be considered dirty.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      102a7e2c
    • David Howells's avatar
      netfs: Implement buffered write API · 938e13a7
      David Howells authored
      Institute a netfs write helper, netfs_file_write_iter(), to be pointed at
      by the network filesystem ->write_iter() call.  Make it handled buffered
      writes by calling the previously defined netfs_perform_write() to copy the
      source data into the pagecache.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      938e13a7
    • David Howells's avatar
      netfs: Implement unbuffered/DIO write support · 153a9961
      David Howells authored
      Implement support for unbuffered writes and direct I/O writes.  If the
      write is misaligned with respect to the fscrypt block size, then RMW cycles
      are performed if necessary.  DIO writes are a special case of unbuffered
      writes with extra restriction imposed, such as block size alignment
      requirements.
      
      Also provide a field that can tell the code to add some extra space onto
      the bounce buffer for use by the filesystem in the case of a
      content-encrypted file.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      153a9961
    • David Howells's avatar
      netfs: Implement unbuffered/DIO read support · 016dc851
      David Howells authored
      Implement support for unbuffered and DIO reads in the netfs library,
      utilising the existing read helper code to do block splitting and
      individual queuing.  The code also handles extraction of the destination
      buffer from the supplied iterator, allowing async unbuffered reads to take
      place.
      
      The read will be split up according to the rsize setting and, if supplied,
      the ->clamp_length() method.  Note that the next subrequest will be issued
      as soon as issue_op returns, without waiting for previous ones to finish.
      The network filesystem needs to pause or handle queuing them if it doesn't
      want to fire them all at the server simultaneously.
      
      Once all the subrequests have finished, the state will be assessed and the
      amount of data to be indicated as having being obtained will be
      determined.  As the subrequests may finish in any order, if an intermediate
      subrequest is short, any further subrequests may be copied into the buffer
      and then abandoned.
      
      In the future, this will also take care of doing an unbuffered read from
      encrypted content, with the decryption being done by the library.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      016dc851
    • David Howells's avatar
      netfs: Allocate multipage folios in the writepath · e2e2e839
      David Howells authored
      Allocate a multipage folio when copying data into the pagecache if possible
      if there's sufficient data to warrant it.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      e2e2e839
    • David Howells's avatar
      netfs: Make netfs_read_folio() handle streaming-write pages · 7f84a7b9
      David Howells authored
      netfs_read_folio() needs to handle partially-valid pages that are marked
      dirty, but not uptodate in the event that someone tries to read a page was
      used to cache data by a streaming write.
      
      In such a case, make netfs_read_folio() set up a bvec iterator that points
      to the parts of the folio that need filling and to a sink page for the data
      that should be discarded and use that instead of i_pages as the iterator to
      be written to.
      
      This requires netfs_rreq_unlock_folios() to convert the page into a normal
      dirty uptodate page, getting rid of the partial write record and bumping
      the group pointer over to folio->private.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      7f84a7b9
    • David Howells's avatar
      netfs: Provide func to copy data to pagecache for buffered write · c38f4e96
      David Howells authored
      Provide a netfs write helper, netfs_perform_write() to buffer data to be
      written in the pagecache and mark the modified folios dirty.
      
      It will perform "streaming writes" for folios that aren't currently
      resident, if possible, storing data in partially modified folios that are
      marked dirty, but not uptodate.  It will also tag pages as belonging to
      fs-specific write groups if so directed by the filesystem.
      
      This is derived from generic_perform_write(), but doesn't use
      ->write_begin() and ->write_end(), having that logic rolled in instead.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      c38f4e96
    • David Howells's avatar
      netfs: Dispatch write requests to process a writeback slice · 0e0f2dfe
      David Howells authored
      Dispatch one or more write reqeusts to process a writeback slice, where a
      slice is tailored more to logical block divisions within the file (such as
      crypto blocks, an object layout or cache granules) than the protocol RPC
      maximum capacity.
      
      The dispatch doesn't happen until throttling allows, at which point the
      entire writeback slice is processed and queued.  A slice may be written to
      multiple destinations (one or more servers and the local cache) and the
      writes to each destination might be split up along different lines.
      
      The writeback slice holds the required folios pinned.  An iov_iter is
      provided in netfs_write_request that describes the buffer to be used.  This
      may be part of the pagecache, may have auxiliary padding pages attached or
      may be a bounce buffer resulting from crypto or compression.  Consequently,
      the filesystem must not twiddle the folio markings directly.
      
      The following API is available to the filesystem:
      
       (1) The ->create_write_requests() method is called to ask the filesystem
           to create the requests it needs.  This is passed the writeback slice
           to be processed.
      
       (2) The filesystem should then call netfs_create_write_request() to create
           the requests it needs.
      
       (3) Once a request is initialised, netfs_queue_write_request() can be
           called to dispatch it asynchronously, if not completed immediately.
      
       (4) netfs_write_request_completed() should be called to note the
           completion of a request.
      
       (5) netfs_get_write_request() and netfs_put_write_request() are provided
           to refcount a request.  These take constants from the netfs_wreq_trace
           enum for logging into ftrace.
      
       (6) The ->free_write_request is method is called to ask the filesystem to
           clean up a request.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      0e0f2dfe
    • David Howells's avatar
      netfs: Prep to use folio->private for write grouping and streaming write · 9ebff83e
      David Howells authored
      Prepare to use folio->private to hold information write grouping and
      streaming write.  These are implemented in the same commit as they both
      make use of folio->private and will be both checked at the same time in
      several places.
      
      "Write grouping" involves ordering the writeback of groups of writes, such
      as is needed for ceph snaps.  A group is represented by a
      filesystem-supplied object which must contain a netfs_group struct.  This
      contains just a refcount and a pointer to a destructor.
      
      "Streaming write" is the storage of data in folios that are marked dirty,
      but not uptodate, to avoid unnecessary reads of data.  This is represented
      by a netfs_folio struct.  This contains the offset and length of the
      modified region plus the otherwise displaced write grouping pointer.
      
      The way folio->private is multiplexed is:
      
       (1) If private is NULL then neither is in operation on a dirty folio.
      
       (2) If private is set, with bit 0 clear, then this points to a group.
      
       (3) If private is set, with bit 0 set, then this points to a netfs_folio
           struct (with bit 0 AND'ed out).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      9ebff83e
    • David Howells's avatar
      netfs: Make the refcounting of netfs_begin_read() easier to use · 4fcccc38
      David Howells authored
      Make the refcounting of netfs_begin_read() easier to use by not eating the
      caller's ref on the netfs_io_request it's given.  This makes it easier to
      use when we need to look in the request struct after.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      4fcccc38
    • David Howells's avatar
      netfs: Make netfs_put_request() handle a NULL pointer · 6ba22d8d
      David Howells authored
      Make netfs_put_request() just return if given a NULL request pointer.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      6ba22d8d
    • David Howells's avatar
      netfs: Add a hook to allow tell the netfs to update its i_size · c6dc54dd
      David Howells authored
      Add a hook for netfslib's write helpers to call to tell the network
      filesystem that it should update its i_size.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      c6dc54dd
    • David Howells's avatar
      netfs: Extend the netfs_io_*request structs to handle writes · 16af134c
      David Howells authored
      Modify the netfs_io_request struct to act as a point around which writes
      can be coordinated.  It represents and pins a range of pages that need
      writing and a list of regions of dirty data in that range of pages.
      
      If RMW is required, the original data can be downloaded into the bounce
      buffer, decrypted if necessary, the modifications made, then the modified
      data can be reencrypted/recompressed and sent back to the server.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      16af134c
    • David Howells's avatar
      netfs: Limit subrequest by size or number of segments · 768ddb1e
      David Howells authored
      Limit a subrequest to a maximum size and/or a maximum number of contiguous
      physical regions.  This permits, for instance, an subreq's iterator to be
      limited to the number of DMA'able segments that a large RDMA request can
      handle.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      768ddb1e
    • David Howells's avatar
      netfs: Add func to calculate pagecount/size-limited span of an iterator · cae932d3
      David Howells authored
      Add a function to work out how much of an ITER_BVEC or ITER_XARRAY iterator
      we can use in a pagecount-limited and size-limited span.  This will be
      used, for example, to limit the number of segments in a subrequest to the
      maximum number of elements that an RDMA transfer can handle.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      cae932d3
    • David Howells's avatar
      netfs: Provide tools to create a buffer in an xarray · 7d828a06
      David Howells authored
      Provide tools to create a buffer in an xarray, with a function to add new
      folios with a mark.  This will be used to create bounce buffer and can be
      used more easily to create a list of folios the span of which would require
      more than a page's worth of bio_vec structs.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      7d828a06
    • David Howells's avatar
      netfs: Add support for DIO buffering · 21d706d5
      David Howells authored
      Add a bvec array pointer and an iterator to netfs_io_request for either
      holding a copy of a DIO iterator or a list of all the bits of buffer
      pointed to by a DIO iterator.
      
      There are two problems:  Firstly, if an iovec-class iov_iter is passed to
      ->read_iter() or ->write_iter(), this cannot be passed directly to
      kernel_sendmsg() or kernel_recvmsg() as that may cause locking recursion if
      a fault is generated, so we need to keep track of the pages involved
      separately.
      
      Secondly, if the I/O is asynchronous, we must copy the iov_iter describing
      the buffer before returning to the caller as it may be immediately
      deallocated.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      21d706d5
  2. 24 Dec, 2023 14 commits