An error occurred fetching the project authors.
  1. 23 Aug, 2018 1 commit
    • Christian Brauner's avatar
      getxattr: use correct xattr length · 82c9a927
      Christian Brauner authored
      When running in a container with a user namespace, if you call getxattr
      with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
      silently skips the user namespace fixup that it normally does resulting in
      un-fixed-up data being returned.
      This is caused by posix_acl_fix_xattr_to_user() being passed the total
      buffer size and not the actual size of the xattr as returned by
      vfs_getxattr().
      This commit passes the actual length of the xattr as returned by
      vfs_getxattr() down.
      
      A reproducer for the issue is:
      
        touch acl_posix
      
        setfacl -m user:0:rwx acl_posix
      
      and the compile:
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <sys/types.h>
        #include <unistd.h>
        #include <attr/xattr.h>
      
        /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
        int main(int argc, void **argv)
        {
                ssize_t ret1, ret2;
                char buf1[128], buf2[132];
                int fret = EXIT_SUCCESS;
                char *file;
      
                if (argc < 2) {
                        fprintf(stderr,
                                "Please specify a file with "
                                "\"system.posix_acl_access\" permissions set\n");
                        _exit(EXIT_FAILURE);
                }
                file = argv[1];
      
                ret1 = getxattr(file, "system.posix_acl_access",
                                buf1, sizeof(buf1));
                if (ret1 < 0) {
                        fprintf(stderr, "%s - Failed to retrieve "
                                        "\"system.posix_acl_access\" "
                                        "from \"%s\"\n", strerror(errno), file);
                        _exit(EXIT_FAILURE);
                }
      
                ret2 = getxattr(file, "system.posix_acl_access",
                                buf2, sizeof(buf2));
                if (ret2 < 0) {
                        fprintf(stderr, "%s - Failed to retrieve "
                                        "\"system.posix_acl_access\" "
                                        "from \"%s\"\n", strerror(errno), file);
                        _exit(EXIT_FAILURE);
                }
      
                if (ret1 != ret2) {
                        fprintf(stderr, "The value of \"system.posix_acl_"
                                        "access\" for file \"%s\" changed "
                                        "between two successive calls\n", file);
                        _exit(EXIT_FAILURE);
                }
      
                for (ssize_t i = 0; i < ret2; i++) {
                        if (buf1[i] == buf2[i])
                                continue;
      
                        fprintf(stderr,
                                "Unexpected different in byte %zd: "
                                "%02x != %02x\n", i, buf1[i], buf2[i]);
                        fret = EXIT_FAILURE;
                }
      
                if (fret == EXIT_SUCCESS)
                        fprintf(stderr, "Test passed\n");
                else
                        fprintf(stderr, "Test failed\n");
      
                _exit(fret);
        }
      and run:
      
        ./tester acl_posix
      
      On a non-fixed up kernel this should return something like:
      
        root@c1:/# ./t
        Unexpected different in byte 16: ffffffa0 != 00
        Unexpected different in byte 17: ffffff86 != 00
        Unexpected different in byte 18: 01 != 00
      
      and on a fixed kernel:
      
        root@c1:~# ./t
        Test passed
      
      Cc: stable@vger.kernel.org
      Fixes: 2f6f0654 ("userns: Convert vfs posix_acl support to use kuids and kgids")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945Reported-by: default avatarColin Watson <cjwatson@ubuntu.com>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      82c9a927
  2. 29 May, 2018 1 commit
  3. 14 May, 2018 1 commit
  4. 04 Oct, 2017 1 commit
    • Casey Schaufler's avatar
      lsm: fix smack_inode_removexattr and xattr_getsecurity memleak · 57e7ba04
      Casey Schaufler authored
      security_inode_getsecurity() provides the text string value
      of a security attribute. It does not provide a "secctx".
      The code in xattr_getsecurity() that calls security_inode_getsecurity()
      and then calls security_release_secctx() happened to work because
      SElinux and Smack treat the attribute and the secctx the same way.
      It fails for cap_inode_getsecurity(), because that module has no
      secctx that ever needs releasing. It turns out that Smack is the
      one that's doing things wrong by not allocating memory when instructed
      to do so by the "alloc" parameter.
      
      The fix is simple enough. Change the security_release_secctx() to
      kfree() because it isn't a secctx being returned by
      security_inode_getsecurity(). Change Smack to allocate the string when
      told to do so.
      
      Note: this also fixes memory leaks for LSMs which implement
      inode_getsecurity but not release_secctx, such as capabilities.
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      57e7ba04
  5. 05 Sep, 2017 1 commit
    • Miklos Szeredi's avatar
      ovl: don't allow writing ioctl on lower layer · 7c6893e3
      Miklos Szeredi authored
      Problem with ioctl() is that it's a file operation, yet often used as an
      inode operation (i.e. modify the inode despite the file being opened for
      read-only).
      
      mnt_want_write_file() is used by filesystems in such cases to get write
      access on an arbitrary open file.
      
      Since overlayfs lets filesystems do all file operations, including ioctl,
      this can lead to mnt_want_write_file() returning OK for a lower file and
      modification of that lower file.
      
      This patch prevents modification by checking if the file is from an
      overlayfs lower layer and returning EPERM in that case.
      
      Need to introduce a mnt_want_write_file_path() variant that still does the
      old thing for inode operations that can do the copy up + modification
      correctly in such cases (fchown, fsetxattr, fremovexattr).
      
      This does not address the correctness of such ioctls on overlayfs (the
      correct way would be to copy up and attempt to perform ioctl on upper
      file).
      
      In theory this could be a regression.  We very much hope that nobody is
      relying on such a hack in any sane setup.
      
      While this patch meddles in VFS code, it has no effect on non-overlayfs
      filesystems.
      Reported-by: default avatar"zhangyi (F)" <yi.zhang@huawei.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      7c6893e3
  6. 01 Sep, 2017 1 commit
    • Serge E. Hallyn's avatar
      Introduce v3 namespaced file capabilities · 8db6c34f
      Serge E. Hallyn authored
      Root in a non-initial user ns cannot be trusted to write a traditional
      security.capability xattr.  If it were allowed to do so, then any
      unprivileged user on the host could map his own uid to root in a private
      namespace, write the xattr, and execute the file with privilege on the
      host.
      
      However supporting file capabilities in a user namespace is very
      desirable.  Not doing so means that any programs designed to run with
      limited privilege must continue to support other methods of gaining and
      dropping privilege.  For instance a program installer must detect
      whether file capabilities can be assigned, and assign them if so but set
      setuid-root otherwise.  The program in turn must know how to drop
      partial capabilities, and do so only if setuid-root.
      
      This patch introduces v3 of the security.capability xattr.  It builds a
      vfs_ns_cap_data struct by appending a uid_t rootid to struct
      vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
      namespace which mounted the filesystem, usually init_user_ns) of the
      root id in whose namespaces the file capabilities may take effect.
      
      When a task asks to write a v2 security.capability xattr, if it is
      privileged with respect to the userns which mounted the filesystem, then
      nothing should change.  Otherwise, the kernel will transparently rewrite
      the xattr as a v3 with the appropriate rootid.  This is done during the
      execution of setxattr() to catch user-space-initiated capability writes.
      Subsequently, any task executing the file which has the noted kuid as
      its root uid, or which is in a descendent user_ns of such a user_ns,
      will run the file with capabilities.
      
      Similarly when asking to read file capabilities, a v3 capability will
      be presented as v2 if it applies to the caller's namespace.
      
      If a task writes a v3 security.capability, then it can provide a uid for
      the xattr so long as the uid is valid in its own user namespace, and it
      is privileged with CAP_SETFCAP over its namespace.  The kernel will
      translate that rootid to an absolute uid, and write that to disk.  After
      this, a task in the writer's namespace will not be able to use those
      capabilities (unless rootid was 0), but a task in a namespace where the
      given uid is root will.
      
      Only a single security.capability xattr may exist at a time for a given
      file.  A task may overwrite an existing xattr so long as it is
      privileged over the inode.  Note this is a departure from previous
      semantics, which required privilege to remove a security.capability
      xattr.  This check can be re-added if deemed useful.
      
      This allows a simple setxattr to work, allows tar/untar to work, and
      allows us to tar in one namespace and untar in another while preserving
      the capability, without risking leaking privilege into a parent
      namespace.
      
      Example using tar:
      
       $ cp /bin/sleep sleepx
       $ mkdir b1 b2
       $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
       $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
       $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
       $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
       $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
         b2/sleepx = cap_sys_admin+ep
       # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
         v3 xattr, rootid is 100001
      
      A patch to linux-test-project adding a new set of tests for this
      functionality is in the nsfscaps branch at github.com/hallyn/ltp
      
      Changelog:
         Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
         Nov 07 2016: convert rootid from and to fs user_ns
         (From ebiederm: mar 28 2017)
           commoncap.c: fix typos - s/v4/v3
           get_vfs_caps_from_disk: clarify the fs_ns root access check
           nsfscaps: change the code split for cap_inode_setxattr()
         Apr 09 2017:
             don't return v3 cap for caps owned by current root.
            return a v2 cap for a true v2 cap in non-init ns
         Apr 18 2017:
            . Change the flow of fscap writing to support s_user_ns writing.
            . Remove refuse_fcap_overwrite().  The value of the previous
              xattr doesn't matter.
         Apr 24 2017:
            . incorporate Eric's incremental diff
            . move cap_convert_nscap to setxattr and simplify its usage
         May 8, 2017:
            . fix leaking dentry refcount in cap_inode_getsecurity
      Signed-off-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      8db6c34f
  7. 09 May, 2017 2 commits
    • Michal Hocko's avatar
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko authored
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
    • Michal Hocko's avatar
      fs/xattr.c: zero out memory copied to userspace in getxattr · 81be3dee
      Michal Hocko authored
      getxattr uses vmalloc to allocate memory if kzalloc fails.  This is
      filled by vfs_getxattr and then copied to the userspace.  vmalloc,
      however, doesn't zero out the memory so if the specific implementation
      of the xattr handler is sloppy we can theoretically expose a kernel
      memory.  There is no real sign this is really the case but let's make
      sure this will not happen and use vzalloc instead.
      
      Fixes: 779302e6 ("fs/xattr.c:getxattr(): improve handling of allocation failures")
      Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.orgAcked-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81be3dee
  8. 24 Dec, 2016 1 commit
  9. 17 Nov, 2016 1 commit
    • Andreas Gruenbacher's avatar
      xattr: Fix setting security xattrs on sockfs · 4a590153
      Andreas Gruenbacher authored
      The IOP_XATTR flag is set on sockfs because sockfs supports getting the
      "system.sockprotoname" xattr.  Since commit 6c6ef9f2, this flag is checked for
      setxattr support as well.  This is wrong on sockfs because security xattr
      support there is supposed to be provided by security_inode_setsecurity.  The
      smack security module relies on socket labels (xattrs).
      
      Fix this by adding a security xattr handler on sockfs that returns
      -EAGAIN, and by checking for -EAGAIN in setxattr.
      
      We cannot simply check for -EOPNOTSUPP in setxattr because there are
      filesystems that neither have direct security xattr support nor support
      via security_inode_setsecurity.  A more proper fix might be to move the
      call to security_inode_setsecurity into sockfs, but it's not clear to me
      if that is safe: we would end up calling security_inode_post_setxattr after
      that as well.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4a590153
  10. 08 Oct, 2016 7 commits
  11. 07 Oct, 2016 1 commit
  12. 05 Jul, 2016 1 commit
    • Eric W. Biederman's avatar
      vfs: Don't modify inodes with a uid or gid unknown to the vfs · 0bd23d09
      Eric W. Biederman authored
      When a filesystem outside of init_user_ns is mounted it could have
      uids and gids stored in it that do not map to init_user_ns.
      
      The plan is to allow those filesystems to set i_uid to INVALID_UID and
      i_gid to INVALID_GID for unmapped uids and gids and then to handle
      that strange case in the vfs to ensure there is consistent robust
      handling of the weirdness.
      
      Upon a careful review of the vfs and filesystems about the only case
      where there is any possibility of confusion or trouble is when the
      inode is written back to disk.  In that case filesystems typically
      read the inode->i_uid and inode->i_gid and write them to disk even
      when just an inode timestamp is being updated.
      
      Which leads to a rule that is very simple to implement and understand
      inodes whose i_uid or i_gid is not valid may not be written.
      
      In dealing with access times this means treat those inodes as if the
      inode flag S_NOATIME was set.  Reads of the inodes appear safe and
      useful, but any write or modification is disallowed.  The only inode
      write that is allowed is a chown that sets the uid and gid on the
      inode to valid values.  After such a chown the inode is normal and may
      be treated as such.
      
      Denying all writes to inodes with uids or gids unknown to the vfs also
      prevents several oddball cases where corruption would have occurred
      because the vfs does not have complete information.
      
      One problem case that is prevented is attempting to use the gid of a
      directory for new inodes where the directories sgid bit is set but the
      directories gid is not mapped.
      
      Another problem case avoided is attempting to update the evm hash
      after setxattr, removexattr, and setattr.  As the evm hash includeds
      the inode->i_uid or inode->i_gid not knowning the uid or gid prevents
      a correct evm hash from being computed.  evm hash verification also
      fails when i_uid or i_gid is unknown but that is essentially harmless
      as it does not cause filesystem corruption.
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      0bd23d09
  13. 28 May, 2016 1 commit
  14. 27 May, 2016 1 commit
  15. 25 May, 2016 2 commits
  16. 11 Apr, 2016 2 commits
  17. 20 Feb, 2016 1 commit
  18. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  19. 09 Jan, 2016 1 commit
  20. 14 Dec, 2015 2 commits
  21. 07 Dec, 2015 4 commits
  22. 14 Nov, 2015 2 commits
    • Andreas Gruenbacher's avatar
      9p: xattr simplifications · e409de99
      Andreas Gruenbacher authored
      Now that the xattr handler is passed to the xattr handler operations, we
      can use the same get and set operations for the user, trusted, and security
      xattr namespaces.  In those namespaces, we can access the full attribute
      name by "reattaching" the name prefix the vfs has skipped for us.  Add a
      xattr_full_name helper to make this obvious in the code.
      
      For the "system.posix_acl_access" and "system.posix_acl_default"
      attributes, handler->prefix is the full attribute name; the suffix is the
      empty string.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: v9fs-developer@lists.sourceforge.net
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e409de99
    • Andreas Gruenbacher's avatar
      xattr handlers: Pass handler to operations instead of flags · d9a82a04
      Andreas Gruenbacher authored
      The xattr_handler operations are currently all passed a file system
      specific flags value which the operations can use to disambiguate between
      different handlers; some file systems use that to distinguish the xattr
      namespace, for example.  In some oprations, it would be useful to also have
      access to the handler prefix.  To allow that, pass a pointer to the handler
      to operations instead of the flags value alone.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d9a82a04
  23. 21 May, 2015 1 commit
  24. 19 Nov, 2014 1 commit
    • Al Viro's avatar
      new helper: audit_file() · 9f45f5bf
      Al Viro authored
      ... for situations when we don't have any candidate in pathnames - basically,
      in descriptor-based syscalls.
      
      [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9f45f5bf
  25. 12 Oct, 2014 1 commit
    • Eric Biggers's avatar
      vfs: Deduplicate code shared by xattr system calls operating on paths · 8cc43116
      Eric Biggers authored
      The following pairs of system calls dealing with extended attributes only
      differ in their behavior on whether the symbolic link is followed (when
      the named file is a symbolic link):
      
      - setxattr() and lsetxattr()
      - getxattr() and lgetxattr()
      - listxattr() and llistxattr()
      - removexattr() and lremovexattr()
      
      Despite this, the implementations all had duplicated code, so this commit
      redirects each of the above pairs of system calls to a corresponding
      function to which different lookup flags (LOOKUP_FOLLOW or 0) are passed.
      
      For me this reduced the stripped size of xattr.o from 8824 to 8248 bytes.
      Signed-off-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8cc43116
  26. 23 Jul, 2014 1 commit