1. 01 Mar, 2024 4 commits
    • Christian Brauner's avatar
      pidfs: convert to path_from_stashed() helper · b28ddcc3
      Christian Brauner authored
      Moving pidfds from the anonymous inode infrastructure to a separate tiny
      in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes
      selinux denials and thus various userspace components that make heavy
      use of pidfds to fail as pidfds used anon_inode_getfile() which aren't
      subject to any LSM hooks. But dentry_open() is and that would cause
      regressions.
      
      The failures that are seen are selinux denials. But the core failure is
      dbus-broker. That cascades into other services failing that depend on
      dbus-broker. For example, when dbus-broker fails to start polkit and all
      the others won't be able to work because they depend on dbus-broker.
      
      The reason for dbus-broker failing is because it doesn't handle failures
      for SO_PEERPIDFD correctly. Last kernel release we introduced
      SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit
      and others to receive a pidfd for the peer of an AF_UNIX socket. This is
      the first time in the history of Linux that we can safely authenticate
      clients in a race-free manner.
      
      dbus-broker immediately made use of this but messed up the error
      checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD.
      That's obviously problematic not just because of LSM denials but because
      of seccomp denials that would prevent SO_PEERPIDFD from working; or any
      other new error code from there.
      
      So this is catching a flawed implementation in dbus-broker as well. It
      has to fallback to the old pid-based authentication when SO_PEERPIDFD
      doesn't work no matter the reasons otherwise it'll always risk such
      failures. So overall that LSM denial should not have caused dbus-broker
      to fail. It can never assume that a feature released one kernel ago like
      SO_PEERPIDFD can be assumed to be available.
      
      So, the next fix separate from the selinux policy update is to try and
      fix dbus-broker at [3]. That should make it into Fedora as well. In
      addition the selinux reference policy should also be updated. See [4]
      for that. If Selinux is in enforcing mode in userspace and it encounters
      anything that it doesn't know about it will deny it by default. And the
      policy is entirely in userspace including declaring new types for stuff
      like nsfs or pidfs to allow it.
      
      For now we continue to raise S_PRIVATE on the inode if it's a pidfs
      inode which means things behave exactly like before.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630
      Link: https://github.com/fedora-selinux/selinux-policy/pull/2050
      Link: https://github.com/bus1/dbus-broker/pull/343 [3]
      Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4]
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/r/20240222190334.GA412503@dev-arch.thelio-3990X
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b28ddcc3
    • Christian Brauner's avatar
      nsfs: convert to path_from_stashed() helper · 1fa08aec
      Christian Brauner authored
      Use the newly added path_from_stashed() helper for nsfs.
      
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      1fa08aec
    • Christian Brauner's avatar
      libfs: add path_from_stashed() · 07fd7c32
      Christian Brauner authored
      Add a helper for both nsfs and pidfs to reuse an already stashed dentry
      or to add and stash a new dentry.
      
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      07fd7c32
    • Christian Brauner's avatar
      pidfd: add pidfs · cb12fd8e
      Christian Brauner authored
      This moves pidfds from the anonymous inode infrastructure to a tiny
      pseudo filesystem. This has been on my todo for quite a while as it will
      unblock further work that we weren't able to do simply because of the
      very justified limitations of anonymous inodes. Moving pidfds to a tiny
      pseudo filesystem allows:
      
      * statx() on pidfds becomes useful for the first time.
      * pidfds can be compared simply via statx() and then comparing inode
        numbers.
      * pidfds have unique inode numbers for the system lifetime.
      * struct pid is now stashed in inode->i_private instead of
        file->private_data. This means it is now possible to introduce
        concepts that operate on a process once all file descriptors have been
        closed. A concrete example is kill-on-last-close.
      * file->private_data is freed up for per-file options for pidfds.
      * Each struct pid will refer to a different inode but the same struct
        pid will refer to the same inode if it's opened multiple times. In
        contrast to now where each struct pid refers to the same inode. Even
        if we were to move to anon_inode_create_getfile() which creates new
        inodes we'd still be associating the same struct pid with multiple
        different inodes.
      
      The tiny pseudo filesystem is not visible anywhere in userspace exactly
      like e.g., pipefs and sockfs. There's no lookup, there's no complex
      inode operations, nothing. Dentries and inodes are always deleted when
      the last pidfd is closed.
      
      We allocate a new inode for each struct pid and we reuse that inode for
      all pidfds. We use iget_locked() to find that inode again based on the
      inode number which isn't recycled. We allocate a new dentry for each
      pidfd that uses the same inode. That is similar to anonymous inodes
      which reuse the same inode for thousands of dentries. For pidfds we're
      talking way less than that. There usually won't be a lot of concurrent
      openers of the same struct pid. They can probably often be counted on
      two hands. I know that systemd does use separate pidfd for the same
      struct pid for various complex process tracking issues. So I think with
      that things actually become way simpler. Especially because we don't
      have to care about lookup. Dentries and inodes continue to be always
      deleted.
      
      The code is entirely optional and fairly small. If it's not selected we
      fallback to anonymous inodes. Heavily inspired by nsfs which uses a
      similar stashing mechanism just for namespaces.
      
      Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.orgSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      cb12fd8e
  2. 28 Feb, 2024 1 commit
  3. 21 Feb, 2024 1 commit
  4. 10 Feb, 2024 2 commits
  5. 07 Feb, 2024 2 commits
  6. 06 Feb, 2024 3 commits
  7. 02 Feb, 2024 7 commits
  8. 21 Jan, 2024 20 commits