1. 21 Jun, 2017 2 commits
    • John Stultz's avatar
      time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD · 369adf04
      John Stultz authored
      CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
      to allow a transition from the old vsyscall implementations to
      the new method (which simplified internal accounting and made
      timekeeping more precise).
      
      However, PPC and IA64 have yet to make the transition, despite
      in some cases me sending test patches to try to help it along.
      
      http://patches.linaro.org/patch/30501/
      http://patches.linaro.org/patch/35412/
      
      If its helpful, my last pass at the patches can be found here:
      https://git.linaro.org/people/john.stultz/linux.git dev/oldvsyscall-cleanup
      
      So I think its time to set a deadline and make it clear this
      is going away. So this patch adds warnings about this
      functionality being dropped. Likely to be in v4.15.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      369adf04
    • John Stultz's avatar
      time: Clean up CLOCK_MONOTONIC_RAW time handling · fc6eead7
      John Stultz authored
      Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
      remove the duplicitive tk->raw_time.tv_nsec, which can be
      stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
      for monotonic time).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      fc6eead7
  2. 20 Jun, 2017 4 commits
    • Thomas Gleixner's avatar
      Merge branch 'clockevents/4.12-fixes' of... · 8e6cec1c
      Thomas Gleixner authored
      Merge branch 'clockevents/4.12-fixes' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
      
      Pull clockevents fixes from Daniel Lezcano:
      
       - Fixed wrong iomem area unmapped in the arch_arm_timer (Frank Rowand)
      
       - Added missing includes for sun5i and cadence-ttc (Stephen Rothwell)
      8e6cec1c
    • Will Deacon's avatar
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · dbb236c1
      Will Deacon authored
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Acked-by: default avatarKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      dbb236c1
    • John Stultz's avatar
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 3d88d56c
      John Stultz authored
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      3d88d56c
    • John Stultz's avatar
      time: Fix clock->read(clock) race around clocksource changes · ceea5e37
      John Stultz authored
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ceea5e37
  3. 19 Jun, 2017 8 commits
    • Linus Torvalds's avatar
      Linux 4.12-rc6 · 41f1830f
      Linus Torvalds authored
      41f1830f
    • Hugh Dickins's avatar
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins authored
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: default avatarOleg Nesterov <oleg@redhat.com>
      Original-patch-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 1132d5e7
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Stream of fixes has slowed down, only a few this week:
      
         - Some DT fixes for Allwinner platforms, and addition of a clock to
           the R_CCU clock controller that had been missed.
      
         - A couple of small DT fixes for am335x-sl50"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
        ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
        ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
        ARM: dts: am335x-sl50: Fix card detect pin for mmc1
        arm64: allwinner: h5: Remove syslink to shared DTSI
        ARM: sunxi: h3/h5: fix the compatible of R_CCU
      1132d5e7
    • Olof Johansson's avatar
      Merge tag 'sunxi-fixes-for-4.12' of... · a1858df9
      Olof Johansson authored
      Merge tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
      
      Allwinner fixes for 4.12
      
      A few fixes around the PRCM support that got in 4.12 with a wrong
      compatible, and a missing clock in the binding.
      
      * tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
        arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
        ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
        arm64: allwinner: h5: Remove syslink to shared DTSI
        ARM: sunxi: h3/h5: fix the compatible of R_CCU
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      a1858df9
    • Olof Johansson's avatar
      Merge tag 'omap-for-v4.12/fixes-sl50' of... · 51b6e281
      Olof Johansson authored
      Merge tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
      
      Two fixes for am335x-sl50 to fix a boot time error
      for claiming SPI pins, and to fix a SDIO card detect
      pin for production version of the device.
      
      * tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
        ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
        ARM: dts: am335x-sl50: Fix card detect pin for mmc1
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      51b6e281
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3696e4f0
      Linus Torvalds authored
      Pull virtio bugfix from Michael Tsirkin:
       "It turns out balloon does not handle IOMMUs correctly. We should fix
        that at some point, for now let's just disable this configuration"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_balloon: disable VIOMMU support
      3696e4f0
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 7d62d947
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Two driver bugfixes"
      
      * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: ismt: fix wrong device address when unmap the data buffer
        i2c: rcar: use correct length when unmapping DMA
      7d62d947
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · b3ee4edd
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
      
       - Three highmem fixes:
          + Fixed mapping initialization
          + Adjust the pkmap location
          + Ensure we use at most one page for PTEs
      
       - Fix makefile dependencies for .its targets to depend on vmlinux
      
       - Fix reversed condition in BNEZC and JIALC software branch emulation
      
       - Only flush initialized flush_insn_slot to avoid NULL pointer
         dereference
      
       - perf: Remove incorrect odd/even counter handling for I6400
      
       - ftrace: Fix init functions tracing
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: .its targets depend on vmlinux
        MIPS: Fix bnezc/jialc return address calculation
        MIPS: kprobes: flush_insn_slot should flush only if probe initialised
        MIPS: ftrace: fix init functions tracing
        MIPS: mm: adjust PKMAP location
        MIPS: highmem: ensure that we don't use more than one page for PTEs
        MIPS: mm: fixed mappings: correct initialisation
        MIPS: perf: Remove incorrect odd/even counter handling for I6400
      b3ee4edd
  4. 18 Jun, 2017 7 commits
  5. 17 Jun, 2017 7 commits
  6. 16 Jun, 2017 12 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 1439ccf7
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - fix another PCI_ENDPOINT build error (merged for v4.12)
      
       - fix error codes added to config accessors for v4.12
      
      * tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: endpoint: Select CRC32 to fix test build error
        PCI: Make error code types consistent in pci_{read,write}_config_*
      1439ccf7
    • Linus Torvalds's avatar
      Merge tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux · 3a448294
      Linus Torvalds authored
      Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
      
       - fix udlfb driver to stop spamming logs (Mike Gerow)
      
       - add missing endianness conversions in smscufx & udlfb drivers (Johan
         Hovold)
      
       - fix few gcc warnings/errors (Arnd Bergmann)
      
      * tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux:
        video: fbdev: udlfb: drop log level for blanking
        video: fbdev: via: remove possibly unused variables
        video: fbdev: add missing USB-descriptor endianness conversions
        video: fbdev: avoid int-in-bool-context warning
      3a448294
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 162f73f4
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: correct the comment when reclaimed pages exceed the scanned pages
        userfaultfd: shmem: handle coredumping in handle_userfault()
        mm: numa: avoid waiting on freed migrated pages
        swap: cond_resched in swap_cgroup_prepare()
        mm/memory-failure.c: use compound_head() flags for huge pages
      162f73f4
    • zhongjiang's avatar
      mm: correct the comment when reclaimed pages exceed the scanned pages · d7143e31
      zhongjiang authored
      Commit e1587a49 ("mm: vmpressure: fix sending wrong events on
      underflow") declared that reclaimed pages exceed the scanned pages due
      to the thp reclaim.
      
      That is incorrect because THP will be spilt to normal page and loop
      again, which will result in the scanned pages increment.
      
      [akpm@linux-foundation.org: tweak comment text]
      Link: http://lkml.kernel.org/r/1496824266-25235-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhongjiang <zhongjiang@huawei.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7143e31
    • Andrea Arcangeli's avatar
      userfaultfd: shmem: handle coredumping in handle_userfault() · 64c2b203
      Andrea Arcangeli authored
      Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to
      __get_user_pages().
      
      shmem as opposed has no special FOLL_DUMP handling there so
      handle_mm_fault() is invoked without mmap_sem and ends up calling
      handle_userfault() that isn't expecting to be invoked without mmap_sem
      held.
      
      This makes handle_userfault() fail immediately if invoked through
      shmem_vm_ops->fault during coredumping and solves the problem.
      
      The side effect is a BUG_ON with no lock held triggered by the
      coredumping process which exits.  Only 4.11 is affected, pre-4.11 anon
      memory holes are skipped in __get_user_pages by checking FOLL_DUMP
      explicitly against empty pagetables (mm/gup.c:no_page_table()).
      
      It's zero cost as we already had a check for current->flags to prevent
      futex to trigger userfaults during exit (PF_EXITING).
      
      Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatar"Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64c2b203
    • Mark Rutland's avatar
      mm: numa: avoid waiting on freed migrated pages · 3c226c63
      Mark Rutland authored
      In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
      waiting until the pmd is unlocked before we return and retry.  However,
      we can race with migrate_misplaced_transhuge_page():
      
          // do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
          // Holds 0 refs on page                 // Holds 2 refs on page
      
          vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
          /* ... */
          if (pmd_trans_migrating(*vmf->pmd)) {
                  page = pmd_page(*vmf->pmd);
                  spin_unlock(vmf->ptl);
                                                  ptl = pmd_lock(mm, pmd);
                                                  if (page_count(page) != 2)) {
                                                          /* roll back */
                                                  }
                                                  /* ... */
                                                  mlock_migrate_page(new_page, page);
                                                  /* ... */
                                                  spin_unlock(ptl);
                                                  put_page(page);
                                                  put_page(page); // page freed here
                  wait_on_page_locked(page);
                  goto out;
          }
      
      This can result in the freed page having its waiters flag set
      unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
      page alloc/free functions.  This has been observed on arm64 KVM guests.
      
      We can avoid this by having do_huge_pmd_numa_page() take a reference on
      the page before dropping the pmd lock, mirroring what we do in
      __migration_entry_wait().
      
      When we hit the race, migrate_misplaced_transhuge_page() will see the
      reference and abort the migration, as it may do today in other cases.
      
      Fixes: b8916634 ("mm: Prevent parallel splits during THP migration")
      Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.comSigned-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarSteve Capper <steve.capper@arm.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c226c63
    • Yu Zhao's avatar
      swap: cond_resched in swap_cgroup_prepare() · ef707629
      Yu Zhao authored
      I saw need_resched() warnings when swapping on large swapfile (TBs)
      because continuously allocating many pages in swap_cgroup_prepare() took
      too long.
      
      We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
      the same for the page allocation.
      
      Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef707629
    • James Morse's avatar
      mm/memory-failure.c: use compound_head() flags for huge pages · 7258ae5c
      James Morse authored
      memory_failure() chooses a recovery action function based on the page
      flags.  For huge pages it uses the tail page flags which don't have
      anything interesting set, resulting in:
      
      > Memory failure: 0x9be3b4: Unknown page state
      > Memory failure: 0x9be3b4: recovery action for unknown page: Failed
      
      Instead, save a copy of the head page's flags if this is a huge page,
      this means if there are no relevant flags for this tail page, we use the
      head pages flags instead.  This results in the me_huge_page() recovery
      action being called:
      
      > Memory failure: 0x9b7969: recovery action for huge page: Delayed
      
      For hugepages that have not yet been allocated, this allows the hugepage
      to be dequeued.
      
      Fixes: 524fca1e ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
      Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.comSigned-off-by: default avatarJames Morse <james.morse@arm.com>
      Tested-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7258ae5c
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5ac447d2
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Three small fixes for recently merged code:
      
         - remove a spurious WARN_ON when a PCI device has no of_node, it's
           allowed in some circumstances for there to be no of_node.
      
         - fix the offset for store EOI MMIOs in the XIVE interrupt
           controller.
      
         - fix non-const WARN_ONs which were becoming BUGs due to them losing
           BUGFLAG_WARNING in a recent cleanup patch.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
        Herrenschmidt"
      
      * tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
        powerpc/xive: Fix offset for store EOI MMIOs
        powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
      5ac447d2
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-4.12-20170616' of... · 531c221d
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-4.12-20170616' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      - Fix probing of precise_ip level for default cycles event, that
        got broken recently on x86_64 when its arch code started
        considering invalid requesting precise samples when not sampling
        (i.e. when attr.sample_period == 0).
      
        This also fixes another problem in s/390 where the precision
        probing with sample_period == 0 returned precise_ip > 0, that
        then, when setting up the real cycles event (not probing) would
        return EOPNOTSUPP for precise_ip > 0 (as determined previously
        by probing) and sample_period > 0.
      
        These problems resulted in attr_precise not being set to the
        highest precision available on x86.64 when no event was specified,
        i.e. the canonical:
      
      	perf record ./workload
      
        would end up using attr.precise_ip = 0. As a workaround this would
        need to be done:
      
      	perf record -e cycles:P ./workload
      
        And on s/390 it would plain not work, requiring using:
      
              perf record -e cycles ./workload
      
        as a workaround.  (Arnaldo Carvalho de Melo)
      
      - Fix perf build with ARCH=x86_64, when ARCH should be transformed
        into ARCH=x86, just like with the main kernel Makefile and
        tools/objtool's, i.e. use SRCARCH. (Jiada Wang)
      
      - Avoid accessing uninitialized data structures when unwinding with
        elfutils's libdw, making it more closely mimic libunwind's unwinder.
        (Milian Wolff)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      531c221d
    • Milian Wolff's avatar
      perf unwind: Report module before querying isactivation in dwfl unwind · 9126cbba
      Milian Wolff authored
      The PC returned by dwfl_frame_pc() may map into a not-yet-reported
      module. We have to report it before we continue unwinding. But when we
      query for the isactivation flag in dwfl_frame_pc, libdw will actually do
      one more unwinding step internally which can then break and lead to
      missed frames or broken stacks.
      
      With libunwind we get e.g.:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
        heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	           f5a1c QGuiApplicationPrivate::createPlatformIntegration (/usr/lib/libQt5Gui.so.5.8.0)
      	           f650c QGuiApplicationPrivate::createEventDispatcher (/usr/lib/libQt5Gui.so.5.8.0)
      	          298524 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      ~~~~~
      
      Note the two frames 1589e8 and 78622 in the first sample. These are
      missing when unwinding with libdw. The second sample's breakage is
      more obvious:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
      heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	          723dbf [unknown] ([unknown])
      ~~~~~
      
      This patch fixes this issue and the libdw unwinder mimicks the libunwind
      behavior more closely.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Acked-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-2-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9126cbba
    • Linus Torvalds's avatar
      Merge tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs · ab2789b7
      Linus Torvalds authored
      Pull configfs updates from Christoph Hellwig:
       "A fix from Nic for a race seen in production (including a stable tag).
      
        And while I'm sending you this I'm also sneaking in a trivial new
        helper from Bart so that we don't need inter-tree dependencies for the
        next merge window"
      
      * tag 'configfs-for-4.12' of git://git.infradead.org/users/hch/configfs:
        configfs: Introduce config_item_get_unless_zero()
        configfs: Fix race between create_link and configfs_rmdir
      ab2789b7