1. 08 Mar, 2017 20 commits
  2. 03 Mar, 2017 4 commits
    • Stefan Bader's avatar
      UBUNTU: Ubuntu-4.4.0-66.87 · 05022128
      Stefan Bader authored
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      05022128
    • Alexander Popov's avatar
      tty: n_hdlc: get rid of racy n_hdlc.tbuf · 579abe8d
      Alexander Popov authored
      Currently N_HDLC line discipline uses a self-made singly linked list for
      data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
      an error.
      
      The commit be10eb75
      ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
      After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
      one data buffer to tx_free_buf_list twice. That causes double free in
      n_hdlc_release().
      
      Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
      in case of tx error put current data buffer after the head of tx_buf_list.
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      
      CVE-2017-2636
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      579abe8d
    • Jiri Slaby's avatar
      TTY: n_hdlc, fix lockdep false positive · c1c6bc86
      Jiri Slaby authored
      The class of 4 n_hdls buf locks is the same because a single function
      n_hdlc_buf_list_init is used to init all the locks. But since
      flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
      n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
      emits a warning:
      =============================================
      [ INFO: possible recursive locking detected ]
      4.3.0-25.g91e30a7-default #1 Not tainted
      ---------------------------------------------
      a.out/1248 is trying to acquire lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
      
      but task is already holding lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->spinlock)->rlock);
        lock(&(&list->spinlock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by a.out/1248:
       #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
       #1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      ...
      Call Trace:
      ...
       [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
       [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
       [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
       [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
      ...
      
      Fix it by initializing the spin_locks separately. This removes also
      reduntand memset of a freshly kzallocated space.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      CVE-2017-2636
      
      (cherry-picked from e9b736d8 upstream)
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      c1c6bc86
    • Stefan Bader's avatar
      UBUNTU: Start new release · 05eb5f2b
      Stefan Bader authored
      Ignore: yes
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      05eb5f2b
  3. 20 Feb, 2017 3 commits
  4. 01 Feb, 2017 1 commit
  5. 31 Jan, 2017 5 commits
  6. 30 Jan, 2017 3 commits
  7. 26 Jan, 2017 4 commits
    • Michal Hocko's avatar
      mm, oom: prevent premature OOM killer invocation for high order request · 018a3c3c
      Michal Hocko authored
      BugLink: https://bugs.launchpad.net/bugs/1655842
      
      There have been several reports about pre-mature OOM killer invocation
      in 4.7 kernel when order-2 allocation request (for the kernel stack)
      invoked OOM killer even during basic workloads (light IO or even kernel
      compile on some filesystems).  In all reported cases the memory is
      fragmented and there are no order-2+ pages available.  There is usually
      a large amount of slab memory (usually dentries/inodes) and further
      debugging has shown that there are way too many unmovable blocks which
      are skipped during the compaction.  Multiple reporters have confirmed
      that the current linux-next which includes [1] and [2] helped and OOMs
      are not reproducible anymore.
      
      A simpler fix for the late rc and stable is to simply ignore the
      compaction feedback and retry as long as there is a reclaim progress and
      we are not getting OOM for order-0 pages.  We already do that for
      CONFING_COMPACTION=n so let's reuse the same code when compaction is
      enabled as well.
      
      [1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@suse.cz
      [2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@suse.cz
      
      Fixes: 0a0337e0 ("mm, oom: rework oom detection")
      Link: http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarOlaf Hering <olaf@aepfle.de>
      Tested-by: default avatarRalf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
      Cc: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[4.7.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 6b4e3181)
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Acked-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      018a3c3c
    • Michal Hocko's avatar
      mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION · 3c2eabde
      Michal Hocko authored
      BugLink: https://bugs.launchpad.net/bugs/1655842
      
      Joonsoo has reported that he is able to trigger OOM for !costly high
      order requests (heavy fork() workload close the OOM) with the new oom
      detection rework.  This is because we rely only on should_reclaim_retry
      when the compaction is disabled and it only checks watermarks for the
      requested order and so we might trigger OOM when there is a lot of free
      memory.
      
      It is not very clear what are the usual workloads when the compaction is
      disabled.  Relying on high order allocations heavily without any
      mechanism to create those orders except for unbound amount of reclaim is
      certainly not a good idea.
      
      To prevent from potential regressions let's help this configuration
      some.  We have to sacrifice the determinsm though because there simply
      is none here possible.  should_compact_retry implementation for
      !CONFIG_COMPACTION, which was empty so far, will do watermark check for
      order-0 on all eligible zones.  This will cause retrying until either
      the reclaim cannot make any further progress or all the zones are
      depleted even for order-0 pages.  This means that the number of retries
      is basically unbounded for !costly orders but that was the case before
      the rework as well so this shouldn't regress.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.orgReported-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (backported from commit 31e49bfd)
      [cascardo: fixup ac_classzone_idx to ac->classzone_idx]
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Acked-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      3c2eabde
    • Michal Hocko's avatar
      mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders · 8f32a02e
      Michal Hocko authored
      BugLink: https://bugs.launchpad.net/bugs/1655842
      
      "mm: consider compaction feedback also for costly allocation" has
      removed the upper bound for the reclaim/compaction retries based on the
      number of reclaimed pages for costly orders.  While this is desirable
      the patch did miss a mis interaction between reclaim, compaction and the
      retry logic.  The direct reclaim tries to get zones over min watermark
      while compaction backs off and returns COMPACT_SKIPPED when all zones
      are below low watermark + 1<<order gap.  If we are getting really close
      to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a
      high order request (e.g.  hugetlb order-9) while the reclaim is not able
      to release enough pages to get us over low watermark.  The reclaim is
      still able to make some progress (usually trashing over few remaining
      pages) so we are not able to break out from the loop.
      
      I have seen this happening with the same test described in "mm: consider
      compaction feedback also for costly allocation" on a swapless system.
      The original problem got resolved by "vmscan: consider classzone_idx in
      compaction_ready" but it shows how things might go wrong when we
      approach the oom event horizont.
      
      The reason why compaction requires being over low rather than min
      watermark is not clear to me.  This check was there essentially since
      56de7263 ("mm: compaction: direct compact when a high-order
      allocation fails").  It is clearly an implementation detail though and
      we shouldn't pull it into the generic retry logic while we should be
      able to cope with such eventuality.  The only place in
      should_compact_retry where we retry without any upper bound is for
      compaction_withdrawn() case.
      
      Introduce compaction_zonelist_suitable function which checks the given
      zonelist and returns true only if there is at least one zone which would
      would unblock __compaction_suitable if more memory got reclaimed.  In
      this implementation it checks __compaction_suitable with NR_FREE_PAGES
      plus part of the reclaimable memory as the target for the watermark
      check.  The reclaimable memory is reduced linearly by the allocation
      order.  The idea is that we do not want to reclaim all the remaining
      memory for a single allocation request just unblock
      __compaction_suitable which doesn't guarantee we will make a further
      progress.
      
      The new helper is then used if compaction_withdrawn() feedback was
      provided so we do not retry if there is no outlook for a further
      progress.  !costly requests shouldn't be affected much - e.g.  order-2
      pages would require to have at least 64kB on the reclaimable LRUs while
      order-9 would need at least 32M which should be enough to not lock up.
      
      [vbabka@suse.cz: fix classzone_idx vs. high_zoneidx usage in compaction_zonelist_suitable]
      [akpm@linux-foundation.org: fix it for Mel's mm-page_alloc-remove-field-from-alloc_context.patch]
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (backported from commit 86a294a8)
      [cascardo: fixup ac_classzone_idx to ac->classzone_idx]
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Acked-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      8f32a02e
    • Michal Hocko's avatar
      mm: consider compaction feedback also for costly allocation · 51760132
      Michal Hocko authored
      BugLink: https://bugs.launchpad.net/bugs/1655842
      
      PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside
      should_reclaim_retry currently where we decide to not retry after at
      least order worth of pages were reclaimed or the watermark check for at
      least one zone would succeed after reclaiming all pages if the reclaim
      hasn't made any progress.  Compaction feedback is mostly ignored and we
      just try to make sure that the compaction did at least something before
      giving up.
      
      The first condition was added by a41f24ea ("page allocator: smarter
      retry of costly-order allocations) and it assumed that lumpy reclaim
      could have created a page of the sufficient order.  Lumpy reclaim, has
      been removed quite some time ago so the assumption doesn't hold anymore.
      Remove the check for the number of reclaimed pages and rely on the
      compaction feedback solely.  should_reclaim_retry now only makes sure
      that we keep retrying reclaim for high order pages only if they are
      hidden by watermaks so order-0 reclaim makes really sense.
      
      should_compact_retry now keeps retrying even for the costly allocations.
      The number of retries is reduced wrt.  !costly requests because they are
      less important and harder to grant and so their pressure shouldn't cause
      contention for other requests or cause an over reclaim.  We also do not
      reset no_progress_loops for costly request to make sure we do not keep
      reclaiming too agressively.
      
      This has been tested by running a process which fragments memory:
      	- compact memory
      	- mmap large portion of the memory (1920M on 2GRAM machine with 2G
      	  of swapspace)
      	- MADV_DONTNEED single page in PAGE_SIZE*((1UL<<MAX_ORDER)-1)
      	  steps until certain amount of memory is freed (250M in my test)
      	  and reduce the step to (step / 2) + 1 after reaching the end of
      	  the mapping
      	- then run a script which populates the page cache 2G (MemTotal)
      	  from /dev/zero to a new file
      And then tries to allocate
      nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(2*1024)}' /proc/meminfo)
      huge pages.
      
      root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M
      Node 0, zone      DMA     31     28     31     10      2      0      2      1      2      3      1
      Node 0, zone    DMA32    437    319    171     50     28     25     20     16     16     14    437
      
      * This is the /proc/buddyinfo after the compaction
      
      Done fragmenting. size=2013265920 freed=262144000
      Node 0, zone      DMA    165     48      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32  35109  14575    185     51     41     12      6      0      0      0      0
      
      * /proc/buddyinfo after memory got fragmented
      
      Executing "/root/alloc_hugepages.sh"
      Eating some pagecache
      508623+0 records in
      508623+0 records out
      2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s
      Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32    111    344    153     20     24     10      3      0      0      0      0
      
      * /proc/buddyinfo after page cache got eaten
      
      Trying to allocate 129
      129
      
      * 129 hugepages requested and all of them granted.
      
      Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32    127     97     30     99     11      6      2      1      4      0      0
      
      * /proc/buddyinfo after hugetlb allocation.
      
      10 runs will behave as follows:
      Trying to allocate 130
      130
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 132
      132
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      
      So basically 100% success for all 10 attempts.
      Without the patch numbers looked much worse:
      Trying to allocate 128
      12
      --
      Trying to allocate 129
      14
      --
      Trying to allocate 129
      7
      --
      Trying to allocate 129
      16
      --
      Trying to allocate 129
      30
      --
      Trying to allocate 129
      38
      --
      Trying to allocate 129
      19
      --
      Trying to allocate 129
      37
      --
      Trying to allocate 129
      28
      --
      Trying to allocate 129
      37
      
      Just for completness the base kernel without oom detection rework looks
      as follows:
      Trying to allocate 127
      30
      --
      Trying to allocate 129
      12
      --
      Trying to allocate 129
      52
      --
      Trying to allocate 128
      32
      --
      Trying to allocate 129
      12
      --
      Trying to allocate 129
      10
      --
      Trying to allocate 129
      32
      --
      Trying to allocate 128
      14
      --
      Trying to allocate 128
      16
      --
      Trying to allocate 129
      8
      
      As we can see the success rate is much more volatile and smaller without
      this patch. So the patch not only makes the retry logic for costly
      requests more sensible the success rate is even higher.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 7854ea6c)
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Acked-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      51760132