1. 30 Sep, 2016 2 commits
    • Eric W. Biederman's avatar
      autofs: Fix automounts by using current_real_cred()->uid · 069d5ac9
      Eric W. Biederman authored
      Seth Forshee reports that in 4.8-rcN some automounts are failing
      because the requesting the automount changed.
      
      The relevant call path is:
      follow_automount()
          ->d_automount
          autofs4_d_automount
             autofs4_mount_wait
                 autofs4_wait
      
      In autofs4_wait wq_uid and wq_gid are set to current_uid() and
      current_gid respectively.  With follow_automount now overriding creds
      uid that we export to userspace changes and that breaks existing
      setups.
      
      To remove the regression set wq_uid and wq_gid from
      current_real_cred()->uid and current_real_cred()->gid respectively.
      This restores the current behavior as current->real_cred is identical
      to current->cred except when override creds are used.
      
      Cc: stable@vger.kernel.org
      Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds")
      Reported-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Tested-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      069d5ac9
    • Eric W. Biederman's avatar
      mnt: Add a per mount namespace limit on the number of mounts · d2921684
      Eric W. Biederman authored
      CAI Qian <caiqian@redhat.com> pointed out that the semantics
      of shared subtrees make it possible to create an exponentially
      increasing number of mounts in a mount namespace.
      
          mkdir /tmp/1 /tmp/2
          mount --make-rshared /
          for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
      
      Will create create 2^20 or 1048576 mounts, which is a practical problem
      as some people have managed to hit this by accident.
      
      As such CVE-2016-6213 was assigned.
      
      Ian Kent <raven@themaw.net> described the situation for autofs users
      as follows:
      
      > The number of mounts for direct mount maps is usually not very large because of
      > the way they are implemented, large direct mount maps can have performance
      > problems. There can be anywhere from a few (likely case a few hundred) to less
      > than 10000, plus mounts that have been triggered and not yet expired.
      >
      > Indirect mounts have one autofs mount at the root plus the number of mounts that
      > have been triggered and not yet expired.
      >
      > The number of autofs indirect map entries can range from a few to the common
      > case of several thousand and in rare cases up to between 30000 and 50000. I've
      > not heard of people with maps larger than 50000 entries.
      >
      > The larger the number of map entries the greater the possibility for a large
      > number of active mounts so it's not hard to expect cases of a 1000 or somewhat
      > more active mounts.
      
      So I am setting the default number of mounts allowed per mount
      namespace at 100,000.  This is more than enough for any use case I
      know of, but small enough to quickly stop an exponential increase
      in mounts.  Which should be perfect to catch misconfigurations and
      malfunctioning programs.
      
      For anyone who needs a higher limit this can be changed by writing
      to the new /proc/sys/fs/mount-max sysctl.
      Tested-by: default avatarCAI Qian <caiqian@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      d2921684
  2. 23 Sep, 2016 7 commits
    • Arnd Bergmann's avatar
      netns: move {inc,dec}_net_namespaces into #ifdef · 2ed6afde
      Arnd Bergmann authored
      With the newly enforced limit on the number of namespaces,
      we get a build warning if CONFIG_NETNS is disabled:
      
      net/core/net_namespace.c:273:13: error: 'dec_net_namespaces' defined but not used [-Werror=unused-function]
      net/core/net_namespace.c:268:24: error: 'inc_net_namespaces' defined but not used [-Werror=unused-function]
      
      This moves the two added functions inside the #ifdef that guards
      their callers.
      
      Fixes: 70328660 ("netns: Add a limit on the number of net namespaces")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      2ed6afde
    • Eric W. Biederman's avatar
      nsfs: Simplify __ns_get_path · 213b067c
      Eric W. Biederman authored
      Move mntget from the very beginning of __ns_get_path to
      the success path of __ns_get_path, and remove the mntget
      calls.
      
      This removes the possibility that there will be a mntget/mntput
      pair of __ns_get_path has to retry, and generally simplifies the code.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      213b067c
    • Eric W. Biederman's avatar
      Merge branch 'nsfs-ioctls' into HEAD · 78725596
      Eric W. Biederman authored
      From: Andrey Vagin <avagin@openvz.org>
      
      Each namespace has an owning user namespace and now there is not way
      to discover these relationships.
      
      Pid and user namepaces are hierarchical. There is no way to discover
      parent-child relationships too.
      
      Why we may want to know relationships between namespaces?
      
      One use would be visualization, in order to understand the running
      system.  Another would be to answer the question: what capability does
      process X have to perform operations on a resource governed by namespace
      Y?
      
      One more use-case (which usually called abnormal) is checkpoint/restart.
      In CRIU we are going to dump and restore nested namespaces.
      
      There [1] was a discussion about which interface to choose to determing
      relationships between namespaces.
      
      Eric suggested to add two ioctl-s [2]:
      > Grumble, Grumble.  I think this may actually a case for creating ioctls
      > for these two cases.  Now that random nsfs file descriptors are bind
      > mountable the original reason for using proc files is not as pressing.
      >
      > One ioctl for the user namespace that owns a file descriptor.
      > One ioctl for the parent namespace of a namespace file descriptor.
      
      Here is an implementaions of these ioctl-s.
      
      $ man man7/namespaces.7
      ...
      Since  Linux  4.X,  the  following  ioctl(2)  calls are supported for
      namespace file descriptors.  The correct syntax is:
      
            fd = ioctl(ns_fd, ioctl_type);
      
      where ioctl_type is one of the following:
      
      NS_GET_USERNS
            Returns a file descriptor that refers to an owning user names‐
            pace.
      
      NS_GET_PARENT
            Returns  a  file descriptor that refers to a parent namespace.
            This ioctl(2) can be used for pid  and  user  namespaces.  For
            user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
            meaning.
      
      In addition to generic ioctl(2) errors, the following  specific  ones
      can occur:
      
      EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
      
      EPERM  The  requested  namespace  is outside of the current namespace
            scope.
      
      [1] https://lkml.org/lkml/2016/7/6/158
      [2] https://lkml.org/lkml/2016/7/9/101
      
      Changes for v2:
      * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
        outside of the init namespace, so we can return EPERM in this case too.
        > The fewer special cases the easier the code is to get
        > correct, and the easier it is to read. // Eric
      
      Changes for v3:
      * rename ns->get_owner() to ns->owner(). get_* usually means that it
        grabs a reference.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: "W. Trevor King" <wking@tremily.us>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      78725596
    • Andrey Vagin's avatar
      tools/testing: add a test to check nsfs ioctl-s · 6ad92bf6
      Andrey Vagin authored
      There are two new ioctl-s:
      One ioctl for the user namespace that owns a file descriptor.
      One ioctl for the parent namespace of a namespace file descriptor.
      
      The test checks that these ioctl-s works and that they handle a case
      when a target namespace is outside of the current process namespace.
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6ad92bf6
    • Andrey Vagin's avatar
      nsfs: add ioctl to get a parent namespace · a7306ed8
      Andrey Vagin authored
      Pid and user namepaces are hierarchical. There is no way to discover
      parent-child relationships.
      
      In a future we will use this interface to dump and restore nested
      namespaces.
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      a7306ed8
    • Andrey Vagin's avatar
      nsfs: add ioctl to get an owning user namespace for ns file descriptor · 6786741d
      Andrey Vagin authored
      Each namespace has an owning user namespace and now there is not way
      to discover these relationships.
      
      Understending namespaces relationships allows to answer the question:
      what capability does process X have to perform operations on a resource
      governed by namespace Y?
      
      After a long discussion, Eric W. Biederman proposed to use ioctl-s for
      this purpose.
      
      The NS_GET_USERNS ioctl returns a file descriptor to an owning user
      namespace.
      It returns EPERM if a target namespace is outside of a current user
      namespace.
      
      v2: rename parent to relative
      
      v3: Add a missing mntput when returning -EAGAIN --EWB
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Link: https://lkml.org/lkml/2016/7/6/158Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6786741d
    • Andrey Vagin's avatar
      kernel: add a helper to get an owning user namespace for a namespace · bcac25a5
      Andrey Vagin authored
      Return -EPERM if an owning user namespace is outside of a process
      current user namespace.
      
      v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
          This special cases was removed from this version. There is nothing
          outside of init_user_ns, so we can return EPERM.
      v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
      grabs a reference.
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      bcac25a5
  3. 22 Sep, 2016 8 commits
  4. 31 Aug, 2016 1 commit
  5. 08 Aug, 2016 12 commits
  6. 07 Aug, 2016 10 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 857953d7
      Linus Torvalds authored
      Pull more block fixes from Jens Axboe:
       "As mentioned in the pull the other day, a few more fixes for this
        round, all related to the bio op changes in this series.
      
        Two fixes, and then a cleanup, renaming bio->bi_rw to bio->bi_opf.  I
        wanted to do that change right after or right before -rc1, so that
        risk of conflict was reduced.  I just rebased the series on top of
        current master, and no new ->bi_rw usage has snuck in"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: rename bio bi_rw to bi_opf
        target: iblock_execute_sync_cache() should use bio_set_op_attrs()
        mm: make __swap_writepage() use bio_set_op_attrs()
        block/mm: make bdev_ops->rw_page() take a bool for read/write
      857953d7
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux · 635a4ba1
      Linus Torvalds authored
      Pull drm zpos property support from Dave Airlie:
       "This tree was waiting on some media stuff I hadn't had time to get a
        stable branchpoint off, so I just waited until it was all in your tree
        first.
      
        It's been around a bit on the list and shouldn't affect anything
        outside adding the generic API and moving some ARM drivers to using
        it"
      
      * tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux:
        drm: rcar: use generic code for managing zpos plane property
        drm/exynos: use generic code for managing zpos plane property
        drm: sti: use generic zpos for plane
        drm: add generic zpos property
      635a4ba1
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
    • Jens Axboe's avatar
      target: iblock_execute_sync_cache() should use bio_set_op_attrs() · 31c64f78
      Jens Axboe authored
      The original commit missed this function, it needs to mark it a
      write flush.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Fixes: e742fc32 ("target: use bio op accessors")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      31c64f78
    • Jens Axboe's avatar
      mm: make __swap_writepage() use bio_set_op_attrs() · ba13e83e
      Jens Axboe authored
      Cleaner than manipulating bio->bi_rw flags directly.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ba13e83e
    • Jens Axboe's avatar
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe authored
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c11f0c0b
    • Linus Torvalds's avatar
      Merge tag 'doc-4.8-fixes' of git://git.lwn.net/linux · 52ddb7e9
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "Three fixes for the docs build, including removing an annoying warning
        on 'make help' if sphinx isn't present"
      
      * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
        DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
        Documenation: update cgroup's document path
        Documentation/sphinx: do not warn about missing tools in 'make help'
      52ddb7e9
    • Linus Torvalds's avatar
      Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc · e9d488c3
      Linus Torvalds authored
      Pull binfmt_misc update from James Bottomley:
       "This update is to allow architecture emulation containers to function
        such that the emulation binary can be housed outside the container
        itself.  The container and fs parts both have acks from relevant
        experts.
      
        To use the new feature you have to add an F option to your binfmt_misc
        configuration"
      
      From the docs:
       "The usual behaviour of binfmt_misc is to spawn the binary lazily when
        the misc format file is invoked.  However, this doesn't work very well
        in the face of mount namespaces and changeroots, so the F mode opens
        the binary as soon as the emulation is installed and uses the opened
        image to spawn the emulator, meaning it is always available once
        installed, regardless of how the environment changes"
      
      * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
        binfmt_misc: add F option description to documentation
        binfmt_misc: add persistent opened binary handler for containers
        fs: add filp_clone_open API
      e9d488c3
    • Eryu Guan's avatar
      fs: return EPERM on immutable inode · 337684a1
      Eryu Guan authored
      In most cases, EPERM is returned on immutable inode, and there're only a
      few places returning EACCES. I noticed this when running LTP on
      overlayfs, setxattr03 failed due to unexpected EACCES on immutable
      inode.
      
      So converting all EACCES to EPERM on immutable inode.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      337684a1
    • Linus Torvalds's avatar
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · fe64f328
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "Assorted cleanups and fixes.
      
        In the "trivial API change" department - ->d_compare() losing 'parent'
        argument"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        cachefiles: Fix race between inactivating and culling a cache object
        9p: use clone_fid()
        9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
        vfs: make dentry_needs_remove_privs() internal
        vfs: remove file_needs_remove_privs()
        vfs: fix deadlock in file_remove_privs() on overlayfs
        get rid of 'parent' argument of ->d_compare()
        cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
        affs ->d_compare(): don't bother with ->d_inode
        fold _d_rehash() and __d_rehash() together
        fold dentry_rcuwalk_invalidate() into its only remaining caller
      fe64f328