1. 23 Jun, 2017 8 commits
    • Kees Cook's avatar
      fs/exec.c: account for argv/envp pointers · 98da7d08
      Kees Cook authored
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98da7d08
    • Eric Ren's avatar
      ocfs2: fix deadlock caused by recursive locking in xattr · 8818efaa
      Eric Ren authored
      Another deadlock path caused by recursive locking is reported.  This
      kind of issue was introduced since commit 743b5f14 ("ocfs2: take
      inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
      fixed by commit b891fa50 ("ocfs2: fix deadlock issue when taking
      inode lock at vfs entry points").  Yes, we intend to fix this kind of
      case in incremental way, because it's hard to find out all possible
      paths at once.
      
      This one can be reproduced like this.  On node1, cp a large file from
      home directory to ocfs2 mountpoint.  While on node2, run
      setfacl/getfacl.  Both nodes will hang up there.  The backtraces:
      
      On node1:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_write_begin+0x43/0x1a0 [ocfs2]
        generic_perform_write+0xa9/0x180
        __generic_file_write_iter+0x1aa/0x1d0
        ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
        __vfs_write+0xc3/0x130
        vfs_write+0xb1/0x1a0
        SyS_write+0x46/0xa0
      
      On node2:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
        ocfs2_set_acl+0x22d/0x260 [ocfs2]
        ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
        set_posix_acl+0x75/0xb0
        posix_acl_xattr_set+0x49/0xa0
        __vfs_setxattr+0x69/0x80
        __vfs_setxattr_noperm+0x72/0x1a0
        vfs_setxattr+0xa7/0xb0
        setxattr+0x12d/0x190
        path_setxattr+0x9f/0xb0
        SyS_setxattr+0x14/0x20
      
      Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
      exported by commit 439a36b8 ("ocfs2/dlmglue: prepare tracking logic
      to avoid recursive cluster lock").
      
      Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarEric Ren <zren@suse.com>
      Reported-by: default avatarThomas Voegtle <tv@lio96.de>
      Tested-by: default avatarThomas Voegtle <tv@lio96.de>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8818efaa
    • Tejun Heo's avatar
      slub: make sysfs file removal asynchronous · 3b7b3140
      Tejun Heo authored
      Commit bf5eb3de ("slub: separate out sysfs_slab_release() from
      sysfs_slab_remove()") made slub sysfs file removals synchronous to
      kmem_cache shutdown.
      
      Unfortunately, this created a possible ABBA deadlock between slab_mutex
      and sysfs draining mechanism triggering the following lockdep warning.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        4.10.0-test+ #48 Not tainted
        -------------------------------------------------------
        rmmod/1211 is trying to acquire lock:
         (s_active#120){++++.+}, at: [<ffffffff81308073>] kernfs_remove+0x23/0x40
      
        but task is already holding lock:
         (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (slab_mutex){+.+.+.}:
      	 lock_acquire+0xf6/0x1f0
      	 __mutex_lock+0x75/0x950
      	 mutex_lock_nested+0x1b/0x20
      	 slab_attr_store+0x75/0xd0
      	 sysfs_kf_write+0x45/0x60
      	 kernfs_fop_write+0x13c/0x1c0
      	 __vfs_write+0x28/0x120
      	 vfs_write+0xc8/0x1e0
      	 SyS_write+0x49/0xa0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        -> #0 (s_active#120){++++.+}:
      	 __lock_acquire+0x10ed/0x1260
      	 lock_acquire+0xf6/0x1f0
      	 __kernfs_remove+0x254/0x320
      	 kernfs_remove+0x23/0x40
      	 sysfs_remove_dir+0x51/0x80
      	 kobject_del+0x18/0x50
      	 __kmem_cache_shutdown+0x3e6/0x460
      	 kmem_cache_destroy+0x1fb/0x2d0
      	 kvm_exit+0x2d/0x80 [kvm]
      	 vmx_exit+0x19/0xa1b [kvm_intel]
      	 SyS_delete_module+0x198/0x1f0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(slab_mutex);
      				 lock(s_active#120);
      				 lock(slab_mutex);
          lock(s_active#120);
      
         *** DEADLOCK ***
      
        2 locks held by rmmod/1211:
         #0:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff810a7877>] get_online_cpus+0x37/0x80
         #1:  (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        stack backtrace:
        CPU: 3 PID: 1211 Comm: rmmod Not tainted 4.10.0-test+ #48
        Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        Call Trace:
         print_circular_bug+0x1be/0x210
         __lock_acquire+0x10ed/0x1260
         lock_acquire+0xf6/0x1f0
         __kernfs_remove+0x254/0x320
         kernfs_remove+0x23/0x40
         sysfs_remove_dir+0x51/0x80
         kobject_del+0x18/0x50
         __kmem_cache_shutdown+0x3e6/0x460
         kmem_cache_destroy+0x1fb/0x2d0
         kvm_exit+0x2d/0x80 [kvm]
         vmx_exit+0x19/0xa1b [kvm_intel]
         SyS_delete_module+0x198/0x1f0
         ? SyS_delete_module+0x5/0x1f0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It'd be the cleanest to deal with the issue by removing sysfs files
      without holding slab_mutex before the rest of shutdown; however, given
      the current code structure, it is pretty difficult to do so.
      
      This patch punts sysfs file removal to a work item.  Before commit
      bf5eb3de, the removal was punted to a RCU delayed work item which is
      executed after release.  Now, we're punting to a different work item on
      shutdown which still maintains the goal removing the sysfs files earlier
      when destroying kmem_caches.
      
      Link: http://lkml.kernel.org/r/20170620204512.GI21326@htj.duckdns.org
      Fixes: bf5eb3de ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Tested-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b7b3140
    • Ilya Matveychikov's avatar
      lib/cmdline.c: fix get_options() overflow while parsing ranges · a91e0f68
      Ilya Matveychikov authored
      When using get_options() it's possible to specify a range of numbers,
      like 1-100500.  The problem is that it doesn't track array size while
      calling internally to get_range() which iterates over the range and
      fills the memory with numbers.
      
      Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.comSigned-off-by: default avatarIlya V. Matveychikov <matvejchikov@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a91e0f68
    • Jan Kara's avatar
      fs/dax.c: fix inefficiency in dax_writeback_mapping_range() · 1eb643d0
      Jan Kara authored
      dax_writeback_mapping_range() fails to update iteration index when
      searching radix tree for entries needing cache flushing.  Thus each
      pagevec worth of entries is searched starting from the start which is
      inefficient and prone to livelocks.  Update index properly.
      
      Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
      Fixes: 9973c98e ("dax: add support for fsync/sync")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1eb643d0
    • NeilBrown's avatar
      autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · 9fa4eb8e
      NeilBrown authored
      If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
      autofs4_d_automount() will return
      
         ERR_PTR(status)
      
      with that status to follow_automount(), which will then dereference an
      invalid pointer.
      
      So treat a positive status the same as zero, and map to ENOENT.
      
      See comment in systemd src/core/automount.c::automount_send_ready().
      
      Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fa4eb8e
    • Ard Biesheuvel's avatar
      mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings · 029c54b0
      Ard Biesheuvel authored
      Existing code that uses vmalloc_to_page() may assume that any address
      for which is_vmalloc_addr() returns true may be passed into
      vmalloc_to_page() to retrieve the associated struct page.
      
      This is not un unreasonable assumption to make, but on architectures
      that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need
      to ensure that vmalloc_to_page() does not go off into the weeds trying
      to dereference huge PUDs or PMDs as table entries.
      
      Given that vmalloc() and vmap() themselves never create huge mappings or
      deal with compound pages at all, there is no correct answer in this
      case, so return NULL instead, and issue a warning.
      
      When reading /proc/kcore on arm64, you will hit an oops as soon as you
      hit the huge mappings used for the various segments that make up the
      mapping of vmlinux.  With this patch applied, you will no longer hit the
      oops, but the kcore contents willl be incorrect (these regions will be
      zeroed out)
      
      We are fixing this for kcore specifically, so it avoids vread() for
      those regions.  At least one other problematic user exists, i.e.,
      /dev/kmem, but that is currently broken on arm64 for other reasons.
      
      Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.orgSigned-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      029c54b0
    • David Rientjes's avatar
      mm, thp: remove cond_resched from __collapse_huge_page_copy · c891d9f6
      David Rientjes authored
      This is a partial revert of commit 338a16ba ("mm, thp: copying user
      pages must schedule on collapse") which added a cond_resched() to
      __collapse_huge_page_copy().
      
      On x86 with CONFIG_HIGHPTE, __collapse_huge_page_copy is called in
      atomic context and thus scheduling is not possible.  This is only a
      possible config on arm and i386.
      
      Although need_resched has been shown to be set for over 100 jiffies
      while doing the iteration in __collapse_huge_page_copy, this is better
      than doing
      
      	if (in_atomic())
      		cond_resched()
      
      to cover only non-CONFIG_HIGHPTE configs.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c891d9f6
  2. 22 Jun, 2017 7 commits
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · a38371cb
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Various small fixes for stable"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Fix some return values in case of error in 'crypt_message'
        cifs: remove redundant return in cifs_creation_time_get
        CIFS: Improve readdir verbosity
        CIFS: check if pages is null rather than bv for a failed allocation
        CIFS: Set ->should_dirty in cifs_user_readv()
      a38371cb
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3f7ba7e1
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "MIPS:
         - Fix build with KVM, DYNAMIC_DEBUG and JUMP_LABEL.
      
        PPC:
         - Fix host crashes/hangs on POWER9.
         - Properly restore userspace state after KVM_RUN ioctl.
      
        s390:
         - Fix address translation in odd-ball cases (real-space designation
           ASCEs).
      
        x86:
         - Fix privilege escalation in 64-bit Windows guests
      
        All patches are for stable and the x86 also has a CVE"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix singlestepping over syscall
        KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
        KVM: MIPS: Fix maybe-uninitialized build failure
        KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
        KVM: PPC: Book3S HV: Save/restore host values of debug registers
        KVM: PPC: Book3S HV: Preserve userspace HTM state properly
        KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
        KVM: PPC: Book3S HV: Context-switch EBB registers properly
        KVM: PPC: Book3S HV: Cope with host using large decrementer mode
      3f7ba7e1
    • Linus Torvalds's avatar
      Merge tag 'mfd-fixes-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 4f92f0e2
      Linus Torvalds authored
      Pull MFD fixes from Lee Jones:
      
       - arizona: use address passed in, rather than hard coded value
      
       - correct STM32 clock-names value in DT binding documentation
      
      * tag 'mfd-fixes-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
        dt-bindings: mfd: Update STM32 timers clock names
        mfd: arizona: Fix typo using hard-coded register
      4f92f0e2
    • Paolo Bonzini's avatar
      KVM: x86: fix singlestepping over syscall · c8401dda
      Paolo Bonzini authored
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      c8401dda
    • Radim Krčmář's avatar
      Merge tag 'kvm-s390-master-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · d6aa07c1
      Radim Krčmář authored
      KVM: s390: fix shadow table handling for nested guests
      
      Some odd-ball cases (real-space designation ASCEs) are handled wrong
      for the shadow page tables. Fix it.
      d6aa07c1
    • Heiko Carstens's avatar
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · addb63c1
      Heiko Carstens authored
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      addb63c1
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 8d829b9b
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "This contains a set of fixes for xen-blkback by way of Konrad, and a
        performance regression fix for blk-mq for shared tags.
      
        The latter could account for as much as a 50x reduction in
        performance, with the test case from the user with 500 name spaces. A
        more realistic setup on my end with 32 drives showed a 3.5x drop. The
        fix has been thoroughly tested before being committed"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: fix performance regression with shared tags
        xen-blkback: don't leak stack data via response ring
        xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
        xen/blkback: don't free be structure too early
        xen/blkback: fix disconnect while I/Os in flight
      8d829b9b
  3. 21 Jun, 2017 12 commits
  4. 20 Jun, 2017 13 commits