1. 01 Mar, 2024 1 commit
    • Christian Brauner's avatar
      pidfd: add pidfs · cb12fd8e
      Christian Brauner authored
      This moves pidfds from the anonymous inode infrastructure to a tiny
      pseudo filesystem. This has been on my todo for quite a while as it will
      unblock further work that we weren't able to do simply because of the
      very justified limitations of anonymous inodes. Moving pidfds to a tiny
      pseudo filesystem allows:
      
      * statx() on pidfds becomes useful for the first time.
      * pidfds can be compared simply via statx() and then comparing inode
        numbers.
      * pidfds have unique inode numbers for the system lifetime.
      * struct pid is now stashed in inode->i_private instead of
        file->private_data. This means it is now possible to introduce
        concepts that operate on a process once all file descriptors have been
        closed. A concrete example is kill-on-last-close.
      * file->private_data is freed up for per-file options for pidfds.
      * Each struct pid will refer to a different inode but the same struct
        pid will refer to the same inode if it's opened multiple times. In
        contrast to now where each struct pid refers to the same inode. Even
        if we were to move to anon_inode_create_getfile() which creates new
        inodes we'd still be associating the same struct pid with multiple
        different inodes.
      
      The tiny pseudo filesystem is not visible anywhere in userspace exactly
      like e.g., pipefs and sockfs. There's no lookup, there's no complex
      inode operations, nothing. Dentries and inodes are always deleted when
      the last pidfd is closed.
      
      We allocate a new inode for each struct pid and we reuse that inode for
      all pidfds. We use iget_locked() to find that inode again based on the
      inode number which isn't recycled. We allocate a new dentry for each
      pidfd that uses the same inode. That is similar to anonymous inodes
      which reuse the same inode for thousands of dentries. For pidfds we're
      talking way less than that. There usually won't be a lot of concurrent
      openers of the same struct pid. They can probably often be counted on
      two hands. I know that systemd does use separate pidfd for the same
      struct pid for various complex process tracking issues. So I think with
      that things actually become way simpler. Especially because we don't
      have to care about lookup. Dentries and inodes continue to be always
      deleted.
      
      The code is entirely optional and fairly small. If it's not selected we
      fallback to anonymous inodes. Heavily inspired by nsfs which uses a
      similar stashing mechanism just for namespaces.
      
      Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.orgSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      cb12fd8e
  2. 28 Feb, 2024 1 commit
  3. 21 Feb, 2024 1 commit
  4. 10 Feb, 2024 2 commits
  5. 07 Feb, 2024 2 commits
  6. 06 Feb, 2024 3 commits
  7. 02 Feb, 2024 7 commits
  8. 21 Jan, 2024 23 commits