1. 14 Aug, 2023 4 commits
    • Christian Brauner's avatar
      fs: add FSCONFIG_CMD_CREATE_EXCL · 22ed7ecd
      Christian Brauner authored
      Summary
      =======
      
      This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to
      implement something like mount -t ext4 --exclusive /dev/sda /B which
      fails if a superblock for the requested filesystem does already exist:
      
      Before this patch
      -----------------
      
      $ sudo ./move-mount -f xfs -o source=/dev/sda4 /A
      Requesting filesystem type xfs
      Mount options requested: source=/dev/sda4
      Attaching mount at /A
      Moving single attached mount
      Setting key(source) with val(/dev/sda4)
      
      $ sudo ./move-mount -f xfs -o source=/dev/sda4 /B
      Requesting filesystem type xfs
      Mount options requested: source=/dev/sda4
      Attaching mount at /B
      Moving single attached mount
      Setting key(source) with val(/dev/sda4)
      
      After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL
      --------------------------------------------------------------------------
      
      $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A
      Requesting filesystem type xfs
      Request exclusive superblock creation
      Mount options requested: source=/dev/sda4
      Attaching mount at /A
      Moving single attached mount
      Setting key(source) with val(/dev/sda4)
      
      $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B
      Requesting filesystem type xfs
      Request exclusive superblock creation
      Mount options requested: source=/dev/sda4
      Attaching mount at /B
      Moving single attached mount
      Setting key(source) with val(/dev/sda4)
      Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed
      
      Details
      =======
      
      As mentioned on the list (cf. [1]-[3]) mount requests like
      mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new
      superblock has been created and mounted or an existing superblock has
      been reused and a bind-mount has been created.
      
      This becomes clear in the following example where two processes create
      the same mount for the same block device:
      
      P1                                                              P2
      fd_fs = fsopen("ext4");                                         fd_fs = fsopen("ext4");
      fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda");     fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda");
      fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always");          fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000");
      
      // wins and creates superblock
      fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
                                                                      // finds compatible superblock of P1
                                                                      // spins until P1 sets SB_BORN and grabs a reference
                                                                      fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
      
      fd_mnt1 = fsmount(fd_fs);                                       fd_mnt2 = fsmount(fd_fs);
      move_mount(fd_mnt1, "/A")                                       move_mount(fd_mnt2, "/B")
      
      Not just does P2 get a bind-mount but the mount options that P2
      requestes are silently ignored. The VFS itself doesn't, can't and
      shouldn't enforce filesystem specific mount option compatibility. It
      only enforces incompatibility for read-only <-> read-write transitions:
      
      mount -t ext4       /dev/sda /A
      mount -t ext4 -o ro /dev/sda /B
      
      The read-only request will fail with EBUSY as the VFS can't just
      silently transition a superblock from read-write to read-only or vica
      versa without risking security issues.
      
      To userspace this silent superblock reuse can become a security issue in
      because there is currently no straightforward way for userspace to know
      that they did indeed manage to create a new superblock and didn't just
      reuse an existing one.
      
      This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that
      returns EBUSY if an existing superblock would be reused. Userspace that
      needs to be sure that it did create a new superblock with the requested
      mount options can request superblock creation using this command. If the
      command succeeds they can be sure that they did create a new superblock
      with the requested mount options.
      
      This requires the new mount api. With the old mount api it would be
      necessary to plumb this through every legacy filesystem's
      file_system_type->mount() method. If they want this feature they are
      most welcome to switch to the new mount api.
      
      Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on
      each high-level superblock creation helper:
      
      (1) get_tree_nodev()
      
          Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and
          FSCONFIG_CMD_CREATE_EXCL are equivalent.
      
          The binderfs or overlayfs filesystems are examples.
      
      (4) get_tree_keyed()
      
          Finds an existing superblock based on sb->s_fs_info. Hence,
          FSCONFIG_CMD_CREATE would reuse an existing superblock whereas
          FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
      
          The mqueue or nfsd filesystems are examples.
      
      (2) get_tree_bdev()
      
          This effectively works like get_tree_keyed().
      
          The ext4 or xfs filesystems are examples.
      
      (3) get_tree_single()
      
          Only one superblock of this filesystem type can ever exist.
          Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock
          whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
      
          The securityfs or configfs filesystems are examples.
      
          Note that some single-instance filesystems never destroy the
          superblock once it has been created during the first mount. For
          example, if securityfs has been mounted at least onces then the
          created superblock will never be destroyed again as long as there is
          still an LSM making use it. Consequently, even if securityfs is
          unmounted and the superblock seemingly destroyed it really isn't
          which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting
          reusing an existing superblock.
      
          This is acceptable thugh since special purpose filesystems such as
          this shouldn't have a need to use FSCONFIG_CMD_CREATE_EXCL anyway
          and if they do it's probably to make sure that mount options aren't
          ignored.
      
      Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on
      filesystems that make use of the low-level sget_fc() helper directly.
      They're all effectively variants on get_tree_keyed(), get_tree_bdev(),
      or get_tree_nodev():
      
      (5) mtd_get_sb()
      
          Similar logic to get_tree_keyed().
      
      (6) afs_get_tree()
      
          Similar logic to get_tree_keyed().
      
      (7) ceph_get_tree()
      
          Similar logic to get_tree_keyed().
      
          Already explicitly allows forcing the allocation of a new superblock
          via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev().
      
      (8) fuse_get_tree_submount()
      
          Similar logic to get_tree_nodev().
      
      (9) fuse_get_tree()
      
          Forces reuse of existing FUSE superblock.
      
          Forces reuse of existing superblock if passed in file refers to an
          existing FUSE connection.
          If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd
          referring to an existing FUSE connections this would cause the
          superblock reusal to fail. If reusing is the intent then
          FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
      
      (10) fuse_get_tree()
           -> get_tree_nodev()
      
          Same logic as in get_tree_nodev().
      
      (11) fuse_get_tree()
           -> get_tree_bdev()
      
          Same logic as in get_tree_bdev().
      
      (12) virtio_fs_get_tree()
      
           Same logic as get_tree_keyed().
      
      (13) gfs2_meta_get_tree()
      
           Forces reuse of existing gfs2 superblock.
      
           Mounting gfs2meta enforces that a gf2s superblock must already
           exist. If not, it will error out. Consequently, mounting gfs2meta
           with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the
           intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
      
      (14) kernfs_get_tree()
      
           Similar logic to get_tree_keyed().
      
      (15) nfs_get_tree_common()
      
          Similar logic to get_tree_keyed().
      
          Already explicitly allows forcing the allocation of a new superblock
          via NFS_MOUNT_UNSHARED. This effectively turns it into
          get_tree_nodev().
      
      Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner
      Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner
      Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@braunerReviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230802-vfs-super-exclusive-v2-4-95dc4e41b870@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      22ed7ecd
    • Christian Brauner's avatar
      fs: add vfs_cmd_reconfigure() · 11a51d8c
      Christian Brauner authored
      Split the steps to reconfigure a superblock into a tiny helper instead
      of open-coding it in the switch.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230802-vfs-super-exclusive-v2-3-95dc4e41b870@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      11a51d8c
    • Christian Brauner's avatar
      fs: add vfs_cmd_create() · dae8b08d
      Christian Brauner authored
      Split the steps to create a superblock into a tiny helper. This will
      make the next patch easier to follow.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230802-vfs-super-exclusive-v2-2-95dc4e41b870@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      dae8b08d
    • Christian Brauner's avatar
      super: remove get_tree_single_reconf() · e062abae
      Christian Brauner authored
      The get_tree_single_reconf() helper isn't used anywhere. Remove it.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230802-vfs-super-exclusive-v2-1-95dc4e41b870@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      e062abae
  2. 09 Jul, 2023 10 commits
  3. 08 Jul, 2023 26 commits