• Linus Torvalds's avatar
    Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ecd7db20
    Linus Torvalds authored
    Pull libfs and tmpfs updates from Christian Brauner:
     "This cycle saw a lot of work for tmpfs that required changes to the
      vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this
      cycle. Things will go back to mm next cycle.
    
      Features
      ========
    
       - By far the biggest work is the quota support for tmpfs. New tmpfs
         quota infrastructure is added to support it and a new QFMT_SHMEM
         uapi option is exposed.
    
         This offers user and group quotas to tmpfs (project quotas will be
         added later). Similar to other filesystems tmpfs quota are not
         supported within user namespaces yet.
    
       - Add support for user xattrs. While tmpfs already supports security
         xattrs (security.*) and POSIX ACLs for a long time it lacked
         support for user xattrs (user.*). With this pull request tmpfs will
         be able to support a limited number of user xattrs.
    
         This is accompanied by a fix (see below) to limit persistent simple
         xattr allocations.
    
       - Add support for stable directory offsets. Currently tmpfs relies on
         the libfs provided cursor-based mechanism for readdir. This causes
         issues when a tmpfs filesystem is exported via NFS.
    
         NFS clients do not open directories. Instead, each server-side
         readdir operation opens the directory, reads it, and then closes
         it. Since the cursor state for that directory is associated with
         the opened file it is discarded after each readdir operation. Such
         directory offsets are not just cached by NFS clients but also
         various userspace libraries based on these clients.
    
         As it stands there is no way to invalidate the caches when
         directory offsets have changed and the whole application depends on
         unchanging directory offsets.
    
         At LSFMM we discussed how to solve this problem and decided to
         support stable directory offsets. libfs now allows filesystems like
         tmpfs to use an xarrary to map a directory offset to a dentry. This
         mechanism is currently only used by tmpfs but can be supported by
         others as well.
    
      Fixes
      =====
    
       - Change persistent simple xattrs allocations in libfs from
         GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory
         cgroup limits. Since this is a change to libfs it affects both
         tmpfs and kernfs.
    
       - Correctly verify {g,u}id mount options.
    
         A new filesystem context is created via fsopen() which records the
         namespace that becomes the owning namespace of the superblock when
         fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are
         mountable in namespaces. However, fsconfig() calls can occur in a
         namespace different from the namespace where fsopen() has been
         called.
    
         Currently, when fsconfig() is called to set {g,u}id mount options
         the requested {g,u}id is mapped into a k{g,u}id according to the
         namespace where fsconfig() was called from. The resulting k{g,u}id
         is not guaranteed to be resolvable in the namespace of the
         filesystem (the one that fsopen() was called in).
    
         This means it's possible for an unprivileged user to create files
         owned by any group in a tmpfs mount since it's possible to set the
         setid bits on the tmpfs directory.
    
         The contract for {g,u}id mount options and {g,u}id values in
         general set from userspace has always been that they are translated
         according to the caller's idmapping. In so far, tmpfs has been
         doing the correct thing. But since tmpfs is mountable in
         unprivileged contexts it is also necessary to verify that the
         resulting {k,g}uid is representable in the namespace of the
         superblock to avoid such bugs.
    
         The new mount api's cross-namespace delegation abilities are
         already widely used. Having talked to a bunch of userspace this is
         the most faithful solution with minimal regression risks"
    
    * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
      tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
      mm: invalidation check mapping before folio_contains
      tmpfs: trivial support for direct IO
      tmpfs,xattr: enable limited user extended attributes
      tmpfs: track free_ispace instead of free_inodes
      xattr: simple_xattr_set() return old_xattr to be freed
      tmpfs: verify {g,u}id mount options correctly
      shmem: move spinlock into shmem_recalc_inode() to fix quota support
      libfs: Remove parent dentry locking in offset_iterate_dir()
      libfs: Add a lock class for the offset map's xa_lock
      shmem: stable directory offsets
      shmem: Refactor shmem_symlink()
      libfs: Add directory operations for stable offsets
      shmem: fix quota lock nesting in huge hole handling
      shmem: Add default quota limit mount options
      shmem: quota support
      shmem: prepare shmem quota infrastructure
      quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks
      shmem: make shmem_get_inode() return ERR_PTR instead of NULL
      shmem: make shmem_inode_acct_block() return error
    ecd7db20
inode.c 11 KB