1. 30 Jul, 2023 1 commit
    • David Howells's avatar
      nfsd: Fix reading via splice · 101df45e
      David Howells authored
      nfsd_splice_actor() has a clause in its loop that chops up a compound page
      into individual pages such that if the same page is seen twice in a row, it
      is discarded the second time.  This is a problem with the advent of
      shmem_splice_read() as that inserts zero_pages into the pipe in lieu of
      pages that aren't present in the pagecache.
      
      Fix this by assuming that the last page is being extended only if the
      currently stored length + starting offset is not currently on a page
      boundary.
      
      This can be tested by NFS-exporting a tmpfs filesystem on the test machine
      and truncating it to more than a page in size (eg. truncate -s 8192) and
      then reading it by NFS.  The first page will be all zeros, but thereafter
      garbage will be read.
      
      Note: I wonder if we can ever get a situation now where we get a splice
      that gives us contiguous parts of a page in separate actor calls.  As NFSD
      can only be splicing from a file (I think), there are only three sources of
      the page: copy_splice_read(), shmem_splice_read() and file_splice_read().
      The first allocates pages for the data it reads, so the problem cannot
      occur; the second should never see a partial page; and the third waits for
      each page to become available before we're allowed to read from it.
      
      Fixes: bd194b18 ("shmem: Implement splice-read")
      Reported-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      cc: Hugh Dickins <hughd@google.com>
      cc: Jens Axboe <axboe@kernel.dk>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: linux-nfs@vger.kernel.org
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      101df45e
  2. 18 Jul, 2023 1 commit
  3. 27 Jun, 2023 1 commit
  4. 21 Jun, 2023 1 commit
  5. 18 Jun, 2023 3 commits
    • Chuck Lever's avatar
      svcrdma: Fix stale comment · 88770b8d
      Chuck Lever authored
      Commit 7d81ee87 ("svcrdma: Single-stage RDMA Read") changed the
      behavior of svc_rdma_recvfrom() but neglected to update the
      documenting comment.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      88770b8d
    • Chuck Lever's avatar
      NFSD: Distinguish per-net namespace initialization · 5e092be7
      Chuck Lever authored
      I find the naming of nfsd_init_net() and nfsd_startup_net() to be
      confusingly similar. Rename the namespace initialization and tear-
      down ops and add comments to distinguish their separate purposes.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5e092be7
    • Jeff Layton's avatar
      nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net · ed9ab734
      Jeff Layton authored
      Commit f5f9d4a3 ("nfsd: move reply cache initialization into nfsd
      startup") moved the initialization of the reply cache into nfsd startup,
      but didn't account for the stats counters, which can be accessed before
      nfsd is ever started. The result can be a NULL pointer dereference when
      someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still
      shut down.
      
      This is a regression and a user-triggerable oops in the right situation:
      
      - non-x86_64 arch
      - /proc/fs/nfsd is mounted in the namespace
      - nfsd is not started in the namespace
      - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats"
      
      Although this is easy to trigger on some arches (like aarch64), on
      x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the
      fixed_percpu_data. That struct looks just enough like a newly
      initialized percpu var to allow nfsd_reply_cache_stats_show to access
      it without Oopsing.
      
      Move the initialization of the per-net+per-cpu reply-cache counters
      back into nfsd_init_net, while leaving the rest of the reply cache
      allocations to be done at nfsd startup time.
      
      Kudos to Eirik who did most of the legwork to track this down.
      
      Cc: stable@vger.kernel.org # v6.3+
      Fixes: f5f9d4a3 ("nfsd: move reply cache initialization into nfsd startup")
      Reported-and-tested-by: default avatarEirik Fuller <efuller@redhat.com>
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ed9ab734
  6. 17 Jun, 2023 11 commits
  7. 12 Jun, 2023 11 commits
  8. 11 Jun, 2023 6 commits
    • Jeff Layton's avatar
      nfsd: don't provide pre/post-op attrs if fh_getattr fails · 518f375c
      Jeff Layton authored
      nfsd calls fh_getattr to get the latest inode attrs for pre/post-op
      info. In the event that fh_getattr fails, it resorts to scraping cached
      values out of the inode directly.
      
      Since these attributes are optional, we can just skip providing them
      altogether when this happens.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarNeil Brown <neilb@suse.de>
      518f375c
    • Chuck Lever's avatar
      NFSD: Remove nfsd_readv() · df56b384
      Chuck Lever authored
      nfsd_readv()'s consumers now use nfsd_iter_read().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      df56b384
    • Chuck Lever's avatar
      NFSD: Hoist rq_vec preparation into nfsd_read() [step two] · 703d7521
      Chuck Lever authored
      Now that the preparation of an rq_vec has been removed from the
      generic read path, nfsd_splice_read() no longer needs to reset
      rq_next_page.
      
      nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I
      can ascertain, resetting rq_next_page for NFSv4 splice reads is
      unnecessary because rq_next_page is already set correctly.
      
      Moreover, resetting it might even be incorrect if previous
      operations in the COMPOUND have already consumed at least a page of
      the send buffer. I would expect that the result would be encoding
      the READ payload over previously-encoded results.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      703d7521
    • Chuck Lever's avatar
      NFSD: Hoist rq_vec preparation into nfsd_read() · 507df40e
      Chuck Lever authored
      Accrue the following benefits:
      
      a) Deduplicate this common bit of code.
      
      b) Don't prepare rq_vec for NFSv2 and NFSv3 spliced reads, which
         don't use rq_vec. This is already the case for
         nfsd4_encode_read().
      
      c) Eventually, converting NFSD's read path to use a bvec iterator
         will be simpler.
      
      In the next patch, nfsd_iter_read() will replace nfsd_readv() for
      all NFS versions.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      507df40e
    • Chuck Lever's avatar
      NFSD: Update rq_next_page between COMPOUND operations · ed4a567a
      Chuck Lever authored
      A GETATTR with a large result can advance xdr->page_ptr without
      updating rq_next_page. If a splice READ follows that GETATTR in the
      COMPOUND, nfsd_splice_actor can start splicing at the wrong page.
      
      I've also seen READLINK and READDIR leave rq_next_page in an
      unmodified state.
      
      There are potentially a myriad of combinations like this, so play it
      safe: move the rq_next_page update to nfsd4_encode_operation.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ed4a567a
    • Chuck Lever's avatar
      NFSD: Use svcxdr_encode_opaque_pages() in nfsd4_encode_splice_read() · ba21e20b
      Chuck Lever authored
      Commit 15b23ef5 ("nfsd4: fix corruption of NFSv4 read data")
      encountered exactly the same issue: after a splice read, a
      filesystem-owned page is left in rq_pages[]; the symptoms are the
      same as described there.
      
      If the computed number of pages in nfsd4_encode_splice_read() is not
      exactly the same as the actual number of pages that were consumed by
      nfsd_splice_actor() (say, because of a bug) then hilarity ensues.
      
      Instead of recomputing the page offset based on the size of the
      payload, use rq_next_page, which is already properly updated by
      nfsd_splice_actor(), to cause svc_rqst_release_pages() to operate
      correctly in every instance.
      
      This is a defensive change since we believe that after commit
      27c934dd ("nfsd: don't replace page in rq_pages if it's a
      continuation of last page") has been applied, there are no known
      opportunities for nfsd_splice_actor() to screw up. So I'm not
      marking it for stable backport.
      Reported-by: default avatarAndy Zlotek <andy.zlotek@oracle.com>
      Suggested-by: default avatarCalum Mackay <calum.mackay@oracle.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ba21e20b
  9. 05 Jun, 2023 5 commits