1. 22 Aug, 2023 2 commits
    • Jeff Layton's avatar
      ceph: preallocate inode for ops that may create one · ec9595c0
      Jeff Layton authored
      
      When creating a new inode, we need to determine the crypto context
      before we can transmit the RPC. The fscrypt API has a routine for getting
      a crypto context before a create occurs, but it requires an inode.
      
      Change the ceph code to preallocate an inode in advance of a create of
      any sort (open(), mknod(), symlink(), etc). Move the existing code that
      generates the ACL and SELinux blobs into this routine since that's
      mostly common across all the different codepaths.
      
      In most cases, we just want to allow ceph_fill_trace to use that inode
      after the reply comes in, so add a new field to the MDS request for it
      (r_new_inode).
      
      The async create codepath is a bit different though. In that case, we
      want to hash the inode in advance of the RPC so that it can be used
      before the reply comes in. If the call subsequently fails with
      -EJUKEBOX, then just put the references and clean up the as_ctx. Note
      that with this change, we now need to regenerate the as_ctx when this
      occurs, but it's quite rare for it to happen.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-and-tested-by: default avatarLuís Henriques <lhenriques@suse.de>
      Reviewed-by: default avatarMilind Changire <mchangir@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      ec9595c0
    • Jeff Layton's avatar
      ceph: add new mount option to enable sparse reads · 03bc06c7
      Jeff Layton authored
      
      Add a new mount option that has the client issue sparse reads instead of
      normal ones. The callers now preallocate an sparse extent buffer that
      the libceph receive code can populate and hand back after the operation
      completes.
      
      After a successful sparse read, we can't use the req->r_result value to
      determine the amount of data "read", so instead we set the received
      length to be from the end of the last extent in the buffer. Any
      interstitial holes will have been filled by the receive code.
      
      [ xiubli: fix a double free on req reported by Ilya ]
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-and-tested-by: default avatarLuís Henriques <lhenriques@suse.de>
      Reviewed-by: default avatarMilind Changire <mchangir@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      03bc06c7
  2. 30 Jun, 2023 1 commit
  3. 09 Jun, 2023 2 commits
  4. 24 May, 2023 1 commit
  5. 26 Feb, 2023 1 commit
  6. 03 Feb, 2023 1 commit
  7. 02 Feb, 2023 1 commit
  8. 12 Dec, 2022 2 commits
  9. 25 Nov, 2022 1 commit
    • Al Viro's avatar
      use less confusing names for iov_iter direction initializers · de4eda9d
      Al Viro authored
      
      READ/WRITE proved to be actively confusing - the meanings are
      "data destination, as used with read(2)" and "data source, as
      used with write(2)", but people keep interpreting those as
      "we read data from it" and "we write data to it", i.e. exactly
      the wrong way.
      
      Call them ITER_DEST and ITER_SOURCE - at least that is harder
      to misinterpret...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      de4eda9d
  10. 09 Aug, 2022 2 commits
    • Al Viro's avatar
      iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() · 1ef255e2
      Al Viro authored
      
      Most of the users immediately follow successful iov_iter_get_pages()
      with advancing by the amount it had returned.
      
      Provide inline wrappers doing that, convert trivial open-coded
      uses of those.
      
      BTW, iov_iter_get_pages() never returns more than it had been asked
      to; such checks in cifs ought to be removed someday...
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1ef255e2
    • Al Viro's avatar
      new iov_iter flavour - ITER_UBUF · fcb14cb1
      Al Viro authored
      
      Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
      checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
      ones.
      
      We are going to expose the things like ->write_iter() et.al. to those
      in subsequent commits.
      
      New predicate (user_backed_iter()) that is true for ITER_IOVEC and
      ITER_UBUF; places like direct-IO handling should use that for
      checking that pages we modify after getting them from iov_iter_get_pages()
      would need to be dirtied.
      
      DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
      will solve all problems - there's code that uses iter_is_iovec() to
      decide how to poke around in iov_iter guts and for that the predicate
      replacement obviously won't suffice.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      fcb14cb1
  11. 03 Aug, 2022 1 commit
  12. 02 Aug, 2022 6 commits
  13. 21 Jul, 2022 1 commit
  14. 09 Jun, 2022 1 commit
    • David Howells's avatar
      netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context · 874c8ca1
      David Howells authored
      While randstruct was satisfied with using an open-coded "void *" offset
      cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
      used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
      following complaint[1] from gcc v12:
      
        In file included from include/linux/string.h:253,
                         from include/linux/ceph/ceph_debug.h:7,
                         from fs/ceph/inode.c:2:
        In function 'fortify_memset_chk',
            inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
            inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
        include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
          242 |                         __write_overflow_field(p_size_field, size);
              |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix this by embedding a struct inode into struct netfs_i_context (which
      should perhaps be renamed to struct netfs_inode).  The struct inode
      vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
      structs and vfs_inode is then simply changed to "netfs.inode" in those
      filesystems.
      
      Further, rename netfs_i_context to netfs_inode, get rid of the
      netfs_inode() function that converted a netfs_i_context pointer to an
      inode pointer (that can now be done with &ctx->inode) and rename the
      netfs_i_context() function to netfs_inode() (which is now a wrapper
      around container_of()).
      
      Most of the changes were done with:
      
        perl -p -i -e 's/vfs_inode/netfs.inode/'g \
              `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`
      
      Kees suggested doing it with a pair structure[2] and a special
      declarator to insert that into the network filesystem's inode
      wrapper[3], but I think it's cleaner to embed it - and then it doesn't
      matter if struct randomisation reorders things.
      
      Dave Chinner suggested using a filesystem-specific VFS_I() function in
      each filesystem to convert that filesystem's own inode wrapper struct
      into the VFS inode struct[4].
      
      Version #2:
       - Fix a couple of missed name changes due to a disabled cifs option.
       - Rename nfs_i_context to nfs_inode
       - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
         structs.
      
      [ This also undoes commit 507160f4 ("netfs: gcc-12: temporarily
        disable '-Wattribute-warning' for now") that is no longer needed ]
      
      Fixes: bc899ee1
      
       ("netfs: Add a netfs inode context")
      Reported-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      cc: Jonathan Corbet <corbet@lwn.net>
      cc: Eric Van Hensbergen <ericvh@gmail.com>
      cc: Latchesar Ionkov <lucho@ionkov.net>
      cc: Dominique Martinet <asmadeus@codewreck.org>
      cc: Christian Schoenebeck <linux_oss@crudebyte.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: Ilya Dryomov <idryomov@gmail.com>
      cc: Steve French <smfrench@gmail.com>
      cc: William Kucharski <william.kucharski@oracle.com>
      cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      cc: Dave Chinner <david@fromorbit.com>
      cc: linux-doc@vger.kernel.org
      cc: v9fs-developer@lists.sourceforge.net
      cc: linux-afs@lists.infradead.org
      cc: ceph-devel@vger.kernel.org
      cc: linux-cifs@vger.kernel.org
      cc: samba-technical@lists.samba.org
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-hardening@vger.kernel.org
      Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
      Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
      Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
      Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
      Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
      Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk
      
       # v2
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      874c8ca1
  15. 10 May, 2022 1 commit
  16. 01 Apr, 2022 1 commit
  17. 01 Mar, 2022 2 commits
  18. 26 Jan, 2022 2 commits
  19. 13 Jan, 2022 1 commit
    • Jeff Layton's avatar
      ceph: add new "nopagecache" option · 94cc0877
      Jeff Layton authored
      CephFS is a bit unlike most other filesystems in that it only
      conditionally does buffered I/O based on the caps that it gets from the
      MDS. In most cases, unless there is contended access for an inode the
      MDS does give Fbc caps to the client, so the unbuffered codepaths are
      only infrequently traveled and are difficult to test.
      
      At one time, the "-o sync" mount option would give you this behavior,
      but that was removed in commit 7ab9b380
      
       ("ceph: Don't use
      ceph-sync-mode for synchronous-fs.").
      
      Add a new mount option to tell the client to ignore Fbc caps when doing
      I/O, and to use the synchronous codepaths exclusively, even on
      non-O_DIRECT file descriptors. We already have an ioctl that forces this
      behavior on a per-file basis, so we can just always set the CEPH_F_SYNC
      flag in the file description on such mounts.
      
      Additionally, this patch also changes the client to not request Fbc when
      doing direct I/O. We aren't using the cache with O_DIRECT so we don't
      have any need for those caps.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Acked-by: default avatarGreg Farnum <gfarnum@redhat.com>
      Reviewed-by: default avatarVenky Shankar <vshankar@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      94cc0877
  20. 11 Jan, 2022 1 commit
  21. 01 Dec, 2021 2 commits
  22. 08 Nov, 2021 5 commits
  23. 25 Oct, 2021 1 commit
  24. 19 Oct, 2021 1 commit
    • Jeff Layton's avatar
      ceph: fix handling of "meta" errors · 1bd85aa6
      Jeff Layton authored
      Currently, we check the wb_err too early for directories, before all of
      the unsafe child requests have been waited on. In order to fix that we
      need to check the mapping->wb_err later nearer to the end of ceph_fsync.
      
      We also have an overly-complex method for tracking errors after
      blocklisting. The errors recorded in cleanup_session_requests go to a
      completely separate field in the inode, but we end up reporting them the
      same way we would for any other error (in fsync).
      
      There's no real benefit to tracking these errors in two different
      places, since the only reporting mechanism for them is in fsync, and
      we'd need to advance them both every time.
      
      Given that, we can just remove i_meta_err, and convert the places that
      used it to instead just use mapping->wb_err instead. That also fixes
      the original problem by ensuring that we do a check_and_advance of the
      wb_err at the end of the fsync op.
      
      Cc: stable@vger.kernel.org
      URL: https://tracker.ceph.com/issues/52864
      
      Reported-by: default avatarPatrick Donnelly <pdonnell@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      1bd85aa6