1. 22 Aug, 2017 24 commits
  2. 20 Aug, 2017 8 commits
    • Linus Torvalds's avatar
      Linux 4.13-rc6 · 14ccee78
      Linus Torvalds authored
      14ccee78
    • Linus Torvalds's avatar
      Sanitize 'move_pages()' permission checks · 197e7e52
      Linus Torvalds authored
      The 'move_paghes()' system call was introduced long long ago with the
      same permission checks as for sending a signal (except using
      CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
      
      That turns out to not be a great choice - while the system call really
      only moves physical page allocations around (and you need other
      capabilities to do a lot of it), you can check the return value to map
      out some the virtual address choices and defeat ASLR of a binary that
      still shares your uid.
      
      So change the access checks to the more common 'ptrace_may_access()'
      model instead.
      
      This tightens the access checks for the uid, and also effectively
      changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
      anybody really _uses_ this legacy system call any more (we hav ebetter
      NUMA placement models these days), so I expect nobody to notice.
      
      Famous last words.
      Reported-by: default avatarOtto Ebeling <otto.ebeling@iki.fi>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      197e7e52
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f680d7e
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Another pile of small fixes and updates for x86:
      
         - Plug a hole in the SMAP implementation which misses to clear AC on
           NMI entry
      
         - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
           parameter works correctly again
      
         - Use the proper accessor in the startup64 code for next_early_pgt to
           prevent accessing of invalid addresses and faulting in the early
           boot code.
      
         - Prevent CPU hotplug lock recursion in the MTRR code
      
         - Unbreak CPU0 hotplugging
      
         - Rename overly long CPUID bits which got introduced in this cycle
      
         - Two commits which mark data 'const' and restrict the scope of data
           and functions to file scope by making them 'static'"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Constify attribute_group structures
        x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
        x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
        x86: Fix norandmaps/ADDR_NO_RANDOMIZE
        x86/mtrr: Prevent CPU hotplug lock recursion
        x86: Mark various structures and functions as 'static'
        x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
        x86/smpboot: Unbreak CPU0 hotplug
        x86/asm/64: Clear AC on NMI entries
      7f680d7e
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2615a38f
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "A few small fixes for timer drivers:
      
         - Prevent infinite recursion in the arm architected timer driver with
           ftrace
      
         - Propagate error codes to the caller in case of failure in EM STI
           driver
      
         - Adjust a bogus loop iteration in the arm architected timer driver
      
         - Add a missing Kconfig dependency to the pistachio clocksource to
           prevent build failures
      
         - Correctly check for IS_ERR() instead of NULL in the shared timer-of
           code"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled
        clocksource/drivers/Kconfig: Fix CLKSRC_PISTACHIO dependencies
        clocksource/drivers/timer-of: Checking for IS_ERR() instead of NULL
        clocksource/drivers/em_sti: Fix error return codes in em_sti_probe()
        clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
      2615a38f
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e46db8d2
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "Two fixes for the perf subsystem:
      
         - Fix an inconsistency of RDPMC mm struct tagging across exec() which
           causes RDPMC to fault.
      
         - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
           causes incorrect timestamps and total time calculations"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix time on IOC_ENABLE
        perf/x86: Fix RDPMC vs. mm_struct tracking
      e46db8d2
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9dae41a2
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A pile of smallish changes all over the place:
      
         - Add a missing ISB in the GIC V1 driver
      
         - Remove an ACPI version check in the GIC V3 ITS driver
      
         - Add the missing irq_pm_shutdown function for BRCMSTB-L2 to avoid
           spurious wakeups
      
         - Remove the artifical limitation of ITS instances to the number of
           NUMA nodes which prevents utilizing the ITS hardware correctly
      
         - Prevent a infinite parsing loop in the GIC-V3 ITS/MSI code
      
         - Honour the force affinity argument in the GIC-V3 driver which is
           required to make perf work correctly
      
         - Correctly report allocation failures in GIC-V2/V3 to avoid using
           half allocated and initialized interrupts.
      
         - Fixup checks against nr_cpu_ids in the generic IPI code"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/ipi: Fixup checks against nr_cpu_ids
        genirq: Restore trigger settings in irq_modify_status()
        MAINTAINERS: Remove Jason Cooper's irqchip git tree
        irqchip/gic-v3-its-platform-msi: Fix msi-parent parsing loop
        irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES
        irqchip: brcmstb-l2: Define an irq_pm_shutdown function
        irqchip/gic: Ensure we have an ISB between ack and ->handle_irq
        irqchip/gic-v3-its: Remove ACPICA version check for ACPI NUMA
        irqchip/gic-v3: Honor forced affinity setting
        irqchip/gic-v3: Report failures in gic_irq_domain_alloc
        irqchip/gic-v2: Report failures in gic_irq_domain_alloc
        irqchip/atmel-aic: Remove root argument from ->fixup() prototype
        irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
        irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
      9dae41a2
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e18a5ebc
      Linus Torvalds authored
      Pull watchdog fix from Thomas Gleixner:
       "A fix for the hardlockup watchdog to prevent false positives with
        extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
        the hrtimer which is used to verify.
      
        Slightly larger than the minimal fix, which just would increase the
        hrtimer frequency, but comes with extra overhead of more watchdog
        timer interrupts and thread wakeups for all users.
      
        With this change we restrict the overhead to the extreme Turbo-Mode
        systems"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kernel/watchdog: Prevent false positives with turbo modes
      e18a5ebc
    • Alexey Dobriyan's avatar
      genirq/ipi: Fixup checks against nr_cpu_ids · 8fbbe2d7
      Alexey Dobriyan authored
      Valid CPU ids are [0, nr_cpu_ids-1] inclusive.
      
      Fixes: 3b8e29a8 ("genirq: Implement ipi_send_mask/single()")
      Fixes: f9bce791 ("genirq: Add a new function to get IPI reverse mapping")
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170819095751.GB27864@avx2
      8fbbe2d7
  3. 18 Aug, 2017 8 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 58d4e450
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "14 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
        mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
        mm/mempolicy: fix use after free when calling get_mempolicy
        mm/cma_debug.c: fix stack corruption due to sprintf usage
        signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
        mm, oom: fix potential data corruption when oom_reaper races with writer
        mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
        slub: fix per memcg cache leak on css offline
        mm: discard memblock data later
        test_kmod: fix description for -s -and -c parameters
        kmod: fix wait on recursive loop
        wait: add wait_event_killable_timeout()
        kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
        mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
      58d4e450
    • Kees Cook's avatar
      mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c
      Kees Cook authored
      Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
      broke AddressSanitizer.  This is a partial revert of:
      
        eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
        02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      
      The AddressSanitizer tool has hard-coded expectations about where
      executable mappings are loaded.
      
      The motivation for changing the PIE base in the above commits was to
      avoid the Stack-Clash CVEs that allowed executable mappings to get too
      close to heap and stack.  This was mainly a problem on 32-bit, but the
      64-bit bases were moved too, in an effort to proactively protect those
      systems (proofs of concept do exist that show 64-bit collisions, but
      other recent changes to fix stack accounting and setuid behaviors will
      minimize the impact).
      
      The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
      base), so only the 64-bit PIE base needs to be reverted to let x86 and
      arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
      these architectures can be made optional once a more dynamic method for
      dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
      the mmap region for marked binaries.)
      
      Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
      Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
      Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarKostya Serebryany <kcc@google.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c715b72c
    • Laura Abbott's avatar
      mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM · 704b862f
      Laura Abbott authored
      Commit 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
      use of __GFP_HIGHMEM for allocations.  vmalloc_32 may use
      GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
      trigger a BUG in gfp_zone.
      
      Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249
      Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com
      Fixes: 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
      Signed-off-by: default avatarLaura Abbott <labbott@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      704b862f
    • zhong jiang's avatar
      mm/mempolicy: fix use after free when calling get_mempolicy · 73223e4e
      zhong jiang authored
      I hit a use after free issue when executing trinity and repoduced it
      with KASAN enabled.  The related call trace is as follows.
      
        BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
        Read of size 2 by task syz-executor1/798
      
        INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
           __slab_alloc+0x768/0x970
           kmem_cache_alloc+0x2e7/0x450
           mpol_new.part.2+0x74/0x160
           mpol_new+0x66/0x80
           SyS_mbind+0x267/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
           __slab_free+0x495/0x8e0
           kmem_cache_free+0x2f3/0x4c0
           __mpol_put+0x2b/0x40
           SyS_mbind+0x383/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
        INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
      
        Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
        Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
        Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
        Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        Memory state around the buggy address:
        ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        >ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
      
      !shared memory policy is not protected against parallel removal by other
      thread which is normally protected by the mmap_sem.  do_get_mempolicy,
      however, drops the lock midway while we can still access it later.
      
      Early premature up_read is a historical artifact from times when
      put_user was called in this path see https://lwn.net/Articles/124754/
      but that is gone since 8bccd85f ("[PATCH] Implement sys_* do_*
      layering in the memory policy layer.").  but when we have the the
      current mempolicy ref count model.  The issue was introduced
      accordingly.
      
      Fix the issue by removing the premature release.
      
      Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[2.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73223e4e
    • Prakash Gupta's avatar
      mm/cma_debug.c: fix stack corruption due to sprintf usage · da094e42
      Prakash Gupta authored
      name[] in cma_debugfs_add_one() can only accommodate 16 chars including
      NULL to store sprintf output.  It's common for cma device name to be
      larger than 15 chars.  This can cause stack corrpution.  If the gcc
      stack protector is turned on, this can cause a panic due to stack
      corruption.
      
      Below is one example trace:
      
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
        ffffff8e69a75730
        Call trace:
           dump_backtrace+0x0/0x2c4
           show_stack+0x20/0x28
           dump_stack+0xb8/0xf4
           panic+0x154/0x2b0
           print_tainted+0x0/0xc0
           cma_debugfs_init+0x274/0x290
           do_one_initcall+0x5c/0x168
           kernel_init_freeable+0x1c8/0x280
      
      Fix the short sprintf buffer in cma_debugfs_add_one() by using
      scnprintf() instead of sprintf().
      
      Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org
      Fixes: f318dd08 ("cma: Store a name in the cma structure")
      Signed-off-by: default avatarPrakash Gupta <guptap@codeaurora.org>
      Acked-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da094e42
    • Jamie Iles's avatar
      signal: don't remove SIGNAL_UNKILLABLE for traced tasks. · eb61b591
      Jamie Iles authored
      When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
      faults, but this is undesirable when tracing.  For example, debugging an
      init process (whether global or namespace), hitting a breakpoint and
      SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
      Everything continues fine, but then once debugging has finished, the
      init process is left killable which is unlikely what the user expects,
      resulting in either an accidentally killed init or an init that stops
      reaping zombies.
      
      Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.comSigned-off-by: default avatarJamie Iles <jamie.iles@oracle.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb61b591
    • Michal Hocko's avatar
      mm, oom: fix potential data corruption when oom_reaper races with writer · 6b31d595
      Michal Hocko authored
      Wenwei Tao has noticed that our current assumption that the oom victim
      is dying and never doing any visible changes after it dies, and so the
      oom_reaper can tear it down, is not entirely true.
      
      __task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
      but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
      So there is a race window when some threads won't have
      fatal_signal_pending while the oom_reaper could start unmapping the
      address space.  Moreover some paths might not check for fatal signals
      before each PF/g-u-p/copy_from_user.
      
      We already have a protection for oom_reaper vs.  PF races by checking
      MMF_UNSTABLE.  This has been, however, checked only for kernel threads
      (use_mm users) which can outlive the oom victim.  A simple fix would be
      to extend the current check in handle_mm_fault for all tasks but that
      wouldn't be sufficient because the current check assumes that a kernel
      thread would bail out after EFAULT from get_user*/copy_from_user and
      never re-read the same address which would succeed because the PF path
      has established page tables already.  This seems to be the case for the
      only existing use_mm user currently (virtio driver) but it is rather
      fragile in general.
      
      This is even more fragile in general for more complex paths such as
      generic_perform_write which can re-read the same address more times
      (e.g.  iov_iter_copy_from_user_atomic to fail and then
      iov_iter_fault_in_readable on retry).
      
      Therefore we have to implement MMF_UNSTABLE protection in a robust way
      and never make a potentially corrupted content visible.  That requires
      to hook deeper into the PF path and check for the flag _every time_
      before a pte for anonymous memory is established (that means all
      !VM_SHARED mappings).
      
      The corruption can be triggered artificially
      (http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
      but there doesn't seem to be any real life bug report.  The race window
      should be quite tight to trigger most of the time.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarWenwei Tao <wenwei.tww@alibaba-inc.com>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b31d595
    • Michal Hocko's avatar
      mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS · 5b53a6ea
      Michal Hocko authored
      Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
      handle_mm_fault causes a lockdep splat
      
        Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
        Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
        a.out (1169) used greatest stack depth: 11664 bytes left
        DEBUG_LOCKS_WARN_ON(depth <= 0)
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
        CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
        RIP: 0010:lock_release+0x172/0x1e0
        Call Trace:
           up_read+0x1a/0x40
           __do_page_fault+0x28e/0x4c0
           do_page_fault+0x30/0x80
           page_fault+0x28/0x30
      
      The reason is that the page fault path might have dropped the mmap_sem
      and returned with VM_FAULT_RETRY.  MMF_UNSTABLE check however rewrites
      the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
      that path.  Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
      the MMF_UNSTABLE path.
      
      We cannot simply add VM_FAULT_SIGBUS to the existing error code because
      all arch specific page fault handlers and g-u-p would have to learn a
      new error code combination.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
      Fixes: 3f70dc38 ("mm: make sure that kthreads will not refault oom reaped memory")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b53a6ea