1. 05 Sep, 2024 2 commits
    • Aleksa Sarai's avatar
      fhandle: expose u64 mount id to name_to_handle_at(2) · 4356d575
      Aleksa Sarai authored
      Now that we provide a unique 64-bit mount ID interface in statx(2), we
      can now provide a race-free way for name_to_handle_at(2) to provide a
      file handle and corresponding mount without needing to worry about
      racing with /proc/mountinfo parsing or having to open a file just to do
      statx(2).
      
      While this is not necessary if you are using AT_EMPTY_PATH and don't
      care about an extra statx(2) call, users that pass full paths into
      name_to_handle_at(2) need to know which mount the file handle comes from
      (to make sure they don't try to open_by_handle_at a file handle from a
      different filesystem) and switching to AT_EMPTY_PATH would require
      allocating a file for every name_to_handle_at(2) call, turning
      
        err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
                                AT_HANDLE_MNT_ID_UNIQUE);
      
      into
      
        int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
        err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
        err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
        mntid = statxbuf.stx_mnt_id;
        close(fd);
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-2-10c2c4c16708@cyphar.comReviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      4356d575
    • Aleksa Sarai's avatar
      uapi: explain how per-syscall AT_* flags should be allocated · b4fef22c
      Aleksa Sarai authored
      Unfortunately, the way we have gone about adding new AT_* flags has
      been a little messy. In the beginning, all of the AT_* flags had generic
      meanings and so it made sense to share the flag bits indiscriminately.
      However, we inevitably ran into syscalls that needed their own
      syscall-specific flags. Due to the lack of a planned out policy, we
      ended up with the following situations:
      
       * Existing syscalls adding new features tended to use new AT_* bits,
         with some effort taken to try to re-use bits for flags that were so
         obviously syscall specific that they only make sense for a single
         syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).
      
         Given the constraints of bitflags, this works well in practice, but
         ideally (to avoid future confusion) we would plan ahead and define a
         set of "per-syscall bits" ahead of time so that when allocating new
         bits we don't end up with a complete mish-mash of which bits are
         supposed to be per-syscall and which aren't.
      
       * New syscalls dealt with this in several ways:
      
         - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
           fspick(2)) created their separate own flag spaces that have no
           overlap with the AT_* flags. Most of these ended up allocating
           their bits sequentually.
      
           In the case of move_mount(2) and fspick(2), several flags have
           identical meanings to AT_* flags but were allocated in their own
           flag space.
      
           This makes sense for syscalls that will never share AT_* flags, but
           for some syscalls this leads to duplication with AT_* flags in a
           way that could cause confusion (if renameat2(2) grew a
           RENAME_EMPTY_PATH it seems likely that users could mistake it for
           AT_EMPTY_PATH since it is an *at(2) syscall).
      
         - Some syscalls unfortunately ended up both creating their own flag
           space while also using bits from other flag spaces. The most
           obvious example is open_tree(2), where the standard usage ends up
           using flags from *THREE* separate flag spaces:
      
             open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);
      
           (Note that O_CLOEXEC is also platform-specific, so several future
           OPEN_TREE_* bits are also made unusable in one fell swoop.)
      
      It's not entirely clear to me what the "right" choice is for new
      syscalls. Just saying that all future VFS syscalls should use AT_* flags
      doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
      don't make much sense to burn generic AT_* flags for) and move_mount(2)
      has separate AT_*-like flags for both the source and target so separate
      flags are needed anyway (though it seems possible that renameat2(2)
      could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
      they can't be reused).
      
      But at least for syscalls that _do_ choose to use AT_* flags, we should
      explicitly state the policy that 0x2ff is currently intended for
      per-syscall flags and that new flags should err on the side of
      overlapping with existing flag bits (so we can extend the scope of
      generic flags in the future if necessary).
      
      And add AT_* aliases for the RENAME_* flags to further cement that
      renameat2(2) is an *at(2) flag, just with its own per-syscall flags.
      Suggested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.comReviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b4fef22c
  2. 30 Aug, 2024 38 commits