1. 28 Dec, 2018 11 commits
  2. 24 Dec, 2018 19 commits
  3. 23 Dec, 2018 2 commits
  4. 22 Dec, 2018 5 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9105b8aa
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is two simple target fixes and one discard related I/O starvation
        problem in sd.
      
        The discard problem occurs because the discard page doesn't have a
        mempool backing so if the allocation fails due to memory pressure, we
        then lose the forward progress we require if the writeout is on the
        same device. The fix is to back it with a mempool"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: use mempool for discard special page
        scsi: target: iscsi: cxgbit: add missing spin_lock_init()
        scsi: target: iscsi: cxgbit: fix csk leak
      9105b8aa
    • Linus Torvalds's avatar
      Merge tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux · 1104bd96
      Linus Torvalds authored
      Pull compiler_types.h fix from Miguel Ojeda:
       "A cleanup for userspace in compiler_types.h: don't pollute userspace
        with macro definitions (Xiaozhou Liu)
      
        This is harmless for the kernel, but v4.19 was released with a few
        macros exposed to userspace as the patch explains; which this removes,
        so it *could* happen that we break something for someone (although
        leaving inline redefined is probably worse)"
      
      * tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux:
        include/linux/compiler_types.h: don't pollute userspace with macro definitions
      1104bd96
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux · 38c0ecf6
      Linus Torvalds authored
      Pull auxdisplay fix from Miguel Ojeda:
       "charlcd: fix x/y command parsing (Mans Rullgard)"
      
      * tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux:
        auxdisplay: charlcd: fix x/y command parsing
      38c0ecf6
    • Christian Brauner's avatar
      Revert "vfs: Allow userns root to call mknod on owned filesystems." · 94f82008
      Christian Brauner authored
      This reverts commit 55956b59.
      
      commit 55956b59 ("vfs: Allow userns root to call mknod on owned filesystems.")
      enabled mknod() in user namespaces for userns root if CAP_MKNOD is
      available. However, these device nodes are useless since any filesystem
      mounted from a non-initial user namespace will set the SB_I_NODEV flag on
      the filesystem. Now, when a device node s created in a non-initial user
      namespace a call to open() on said device node will fail due to:
      
      bool may_open_dev(const struct path *path)
      {
              return !(path->mnt->mnt_flags & MNT_NODEV) &&
                      !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
      }
      
      The problem with this is that as of the aforementioned commit mknod()
      creates partially functional device nodes in non-initial user namespaces.
      In particular, it has the consequence that as of the aforementioned commit
      open() will be more privileged with respect to device nodes than mknod().
      Before it was the other way around. Specifically, if mknod() succeeded
      then it was transparent for any userspace application that a fatal error
      must have occured when open() failed.
      
      All of this breaks multiple userspace workloads and a widespread assumption
      about how to handle mknod(). Basically, all container runtimes and systemd
      live by the slogan "ask for forgiveness not permission" when running user
      namespace workloads. For mknod() the assumption is that if the syscall
      succeeds the device nodes are useable irrespective of whether it succeeds
      in a non-initial user namespace or not. This logic was chosen explicitly
      to allow for the glorious day when mknod() will actually be able to create
      fully functional device nodes in user namespaces.
      A specific problem people are already running into when running 4.18 rc
      kernels are failing systemd services. For any distro that is run in a
      container systemd services started with the PrivateDevices= property set
      will fail to start since the device nodes in question cannot be
      opened (cf. the arguments in [1]).
      
      Full disclosure, Seth made the very sound argument that it is already
      possible to end up with partially functional device nodes. Any filesystem
      mounted with MS_NODEV set will allow mknod() to succeed but will not allow
      open() to succeed. The difference to the case here is that the MS_NODEV
      case is transparent to userspace since it is an explicitly set mount option
      while the SB_I_NODEV case is an implicit property enforced by the kernel
      and hence opaque to userspace.
      
      [1]: https://github.com/systemd/systemd/pull/9483Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94f82008
    • Christoph Hellwig's avatar
      dma-mapping: fix flags in dma_alloc_wc · 0cd60eb1
      Christoph Hellwig authored
      We really need the writecombine flag in dma_alloc_wc, fix a stupid
      oversight.
      
      Fixes: 7ed1d91a ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0cd60eb1
  5. 21 Dec, 2018 3 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 23203e3f
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "4 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, page_alloc: fix has_unmovable_pages for HugePages
        fork,memcg: fix crash in free_thread_stack on memcg charge fail
        mm: thp: fix flags for pmd migration when split
        mm, memory_hotplug: initialize struct pages for the full memory section
      23203e3f
    • Oscar Salvador's avatar
      mm, page_alloc: fix has_unmovable_pages for HugePages · 17e2e7d7
      Oscar Salvador authored
      While playing with gigantic hugepages and memory_hotplug, I triggered
      the following #PF when "cat memoryX/removable":
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        #PF error: [normal kernel read fault]
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
        RIP: 0010:has_unmovable_pages+0x154/0x210
        Call Trace:
         is_mem_section_removable+0x7d/0x100
         removable_show+0x90/0xb0
         dev_attr_show+0x1c/0x50
         sysfs_kf_seq_show+0xca/0x1b0
         seq_read+0x133/0x380
         __vfs_read+0x26/0x180
         vfs_read+0x89/0x140
         ksys_read+0x42/0x90
         do_syscall_64+0x5b/0x180
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is we do not pass the Head to page_hstate(), and so, the call
      to compound_order() in page_hstate() returns 0, so we end up checking
      all hstates's size to match PAGE_SIZE.
      
      Obviously, we do not find any hstate matching that size, and we return
      NULL.  Then, we dereference that NULL pointer in
      hugepage_migration_supported() and we got the #PF from above.
      
      Fix that by getting the head page before calling page_hstate().
      
      Also, since gigantic pages span several pageblocks, re-adjust the logic
      for skipping pages.  While are it, we can also get rid of the
      round_up().
      
      [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
        Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
      Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17e2e7d7
    • Rik van Riel's avatar
      fork,memcg: fix crash in free_thread_stack on memcg charge fail · 5eed6f1d
      Rik van Riel authored
      Commit 9b6f7e16 ("mm: rework memcg kernel stack accounting") will
      result in fork failing if allocating a kernel stack for a task in
      dup_task_struct exceeds the kernel memory allowance for that cgroup.
      
      Unfortunately, it also results in a crash.
      
      This is due to the code jumping to free_stack and calling
      free_thread_stack when the memcg kernel stack charge fails, but without
      tsk->stack pointing at the freshly allocated stack.
      
      This in turn results in the vfree_atomic in free_thread_stack oopsing
      with a backtrace like this:
      
      #5 [ffffc900244efc88] die at ffffffff8101f0ab
       #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
       #7 [ffffc900244efce0] general_protection at ffffffff818ff082
          [exception RIP: llist_add_batch+7]
          RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
          RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
          RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
          RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
          R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
          R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
       #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
      #10 [ffffc900244efe98] _do_fork at ffffffff810884e0
      #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
      #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
          RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
          RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
          RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
          RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
          R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
          R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
          ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b
      
      The simplest fix is to assign tsk->stack right where it is allocated.
      
      Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
      Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5eed6f1d