1. 03 Jul, 2018 37 commits
  2. 26 Jun, 2018 3 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.52 · a26899e0
      Greg Kroah-Hartman authored
      a26899e0
    • Vlastimil Babka's avatar
      mm, page_alloc: do not break __GFP_THISNODE by zonelist reset · 1d26c112
      Vlastimil Babka authored
      commit 7810e678 upstream.
      
      In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
      allocations that can ignore memory policies.  The zonelist is obtained
      from current CPU's node.  This is a problem for __GFP_THISNODE
      allocations that want to allocate on a different node, e.g.  because the
      allocating thread has been migrated to a different CPU.
      
      This has been observed to break SLAB in our 4.4-based kernel, because
      there it relies on __GFP_THISNODE working as intended.  If a slab page
      is put on wrong node's list, then further list manipulations may corrupt
      the list because page_to_nid() is used to determine which node's
      list_lock should be locked and thus we may take a wrong lock and race.
      
      Current SLAB implementation seems to be immune by luck thanks to commit
      511e3a05 ("mm/slab: make cache_grow() handle the page allocated on
      arbitrary node") but there may be others assuming that __GFP_THISNODE
      works as promised.
      
      We can fix it by simply removing the zonelist reset completely.  There
      is actually no reason to reset it, because memory policies and cpusets
      don't affect the zonelist choice in the first place.  This was different
      when commit 183f6371 ("mm: ignore mempolicies when using
      ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
      own restricted zonelists.
      
      We might consider this for 4.17 although I don't know if there's
      anything currently broken.
      
      SLAB is currently not affected, but in kernels older than 4.7 that don't
      yet have 511e3a05 ("mm/slab: make cache_grow() handle the page
      allocated on arbitrary node") it is.  That's at least 4.4 LTS.  Older
      ones I'll have to check.
      
      So stable backports should be more important, but will have to be
      reviewed carefully, as the code went through many changes.  BTW I think
      that also the ac->preferred_zoneref reset is currently useless if we
      don't also reset ac->nodemask from a mempolicy to NULL first (which we
      probably should for the OOM victims etc?), but I would leave that for a
      separate patch.
      
      Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Fixes: 183f6371 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d26c112
    • Thadeu Lima de Souza Cascardo's avatar
      fs/binfmt_misc.c: do not allow offset overflow · 250edf95
      Thadeu Lima de Souza Cascardo authored
      commit 5cc41e09 upstream.
      
      WHen registering a new binfmt_misc handler, it is possible to overflow
      the offset to get a negative value, which might crash the system, or
      possibly leak kernel data.
      
      Here is a crash log when 2500000000 was used as an offset:
      
        BUG: unable to handle kernel paging request at ffff989cfd6edca0
        IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
        PGD 1ef3e067 P4D 1ef3e067 PUD 0
        Oops: 0000 [#1] SMP NOPTI
        Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
        CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
        RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
        Call Trace:
          search_binary_handler+0x97/0x1d0
          do_execveat_common.isra.34+0x667/0x810
          SyS_execve+0x31/0x40
          do_syscall_64+0x73/0x130
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Use kstrtoint instead of simple_strtoul.  It will work as the code
      already set the delimiter byte to '\0' and we only do it when the field
      is not empty.
      
      Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX.  Also tested
      with examples documented at Documentation/admin-guide/binfmt-misc.rst
      and other registrations from packages on Ubuntu.
      
      Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      250edf95