1. 16 Aug, 2016 1 commit
    • James Morse's avatar
      PM / hibernate: Fix rtree_next_node() to avoid walking off list ends · 924d8696
      James Morse authored
      rtree_next_node() walks the linked list of leaf nodes to find the next
      block of pages in the struct memory_bitmap. If it walks off the end of
      the list of nodes, it walks the list of memory zones to find the next
      region of memory. If it walks off the end of the list of zones, it
      returns false.
      
      This leaves the struct bm_position's node and zone pointers pointing
      at their respective struct list_heads in struct mem_zone_bm_rtree.
      
      memory_bm_find_bit() uses struct bm_position's node and zone pointers
      to avoid walking lists and trees if the next bit appears in the same
      node/zone. It handles these values being stale.
      
      Swap rtree_next_node()s 'step then test' to 'test-next then step',
      this means if we reach the end of memory we return false and leave
      the node and zone pointers as they were.
      
      This fixes a panic on resume using AMD Seattle with 64K pages:
      [    6.868732] Freezing user space processes ... (elapsed 0.000 seconds) done.
      [    6.875753] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
      [    6.896453] PM: Using 3 thread(s) for decompression.
      [    6.896453] PM: Loading and decompressing image data (5339 pages)...
      [    7.318890] PM: Image loading progress:   0%
      [    7.323395] Unable to handle kernel paging request at virtual address 00800040
      [    7.330611] pgd = ffff000008df0000
      [    7.334003] [00800040] *pgd=00000083fffe0003, *pud=00000083fffe0003, *pmd=00000083fffd0003, *pte=0000000000000000
      [    7.344266] Internal error: Oops: 96000005 [#1] PREEMPT SMP
      [    7.349825] Modules linked in:
      [    7.352871] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W I     4.8.0-rc1 #4737
      [    7.360512] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
      [    7.369109] task: ffff8003c0220000 task.stack: ffff8003c0280000
      [    7.375020] PC is at set_bit+0x18/0x30
      [    7.378758] LR is at memory_bm_set_bit+0x24/0x30
      [    7.383362] pc : [<ffff00000835bbc8>] lr : [<ffff0000080faf18>] pstate: 60000045
      [    7.390743] sp : ffff8003c0283b00
      [    7.473551]
      [    7.475031] Process swapper/0 (pid: 1, stack limit = 0xffff8003c0280020)
      [    7.481718] Stack: (0xffff8003c0283b00 to 0xffff8003c0284000)
      [    7.800075] Call trace:
      [    7.887097] [<ffff00000835bbc8>] set_bit+0x18/0x30
      [    7.891876] [<ffff0000080fb038>] duplicate_memory_bitmap.constprop.38+0x54/0x70
      [    7.899172] [<ffff0000080fcc40>] snapshot_write_next+0x22c/0x47c
      [    7.905166] [<ffff0000080fe1b4>] load_image_lzo+0x754/0xa88
      [    7.910725] [<ffff0000080ff0a8>] swsusp_read+0x144/0x230
      [    7.916025] [<ffff0000080fa338>] load_image_and_restore+0x58/0x90
      [    7.922105] [<ffff0000080fa660>] software_resume+0x2f0/0x338
      [    7.927752] [<ffff000008083350>] do_one_initcall+0x38/0x11c
      [    7.933314] [<ffff000008b40cc0>] kernel_init_freeable+0x14c/0x1ec
      [    7.939395] [<ffff0000087ce564>] kernel_init+0x10/0xfc
      [    7.944520] [<ffff000008082e90>] ret_from_fork+0x10/0x40
      [    7.949820] Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22)
      [    7.955909] ---[ end trace 0024a5986e6ff323 ]---
      [    7.960529] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Here struct mem_zone_bm_rtree's start_pfn has been returned instead of
      struct rtree_node's addr as the node/zone pointers are corrupt after
      we walked off the end of the lists during mark_unsafe_pages().
      
      This behaviour was exposed by commit 6dbecfd3 ("PM / hibernate:
      Simplify mark_unsafe_pages()"), which caused mark_unsafe_pages() to call
      duplicate_memory_bitmap(), which uses memory_bm_find_bit() after walking
      off the end of the memory bitmap.
      
      Fixes: 3a20cb17 (PM / Hibernate: Implement position keeping in radix tree)
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      924d8696
  2. 15 Aug, 2016 1 commit
  3. 13 Aug, 2016 1 commit
  4. 12 Aug, 2016 1 commit
  5. 08 Aug, 2016 1 commit
  6. 02 Aug, 2016 1 commit
    • Rafael J. Wysocki's avatar
      x86/power/64: Do not refer to __PAGE_OFFSET from assembly code · c226fab4
      Rafael J. Wysocki authored
      When CONFIG_RANDOMIZE_MEMORY is set on x86-64, __PAGE_OFFSET becomes
      a variable and using it as a symbol in the image memory restoration
      assembly code under core_restore_code is not correct any more.
      
      To avoid that problem, modify set_up_temporary_mappings() to compute
      the physical address of the temporary page tables and store it in
      temp_level4_pgt, so that the value of that variable is ready to be
      written into CR3.  Then, the assembly code doesn't have to worry
      about converting that value into a physical address and things work
      regardless of whether or not CONFIG_RANDOMIZE_MEMORY is set.
      Reported-and-tested-by: default avatarThomas Garnier <thgarnie@google.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c226fab4
  7. 29 Jul, 2016 1 commit
    • Josh Poimboeuf's avatar
      x86/power/64: Fix hibernation return address corruption · 4ce827b4
      Josh Poimboeuf authored
      In kernel bug 150021, a kernel panic was reported when restoring a
      hibernate image.  Only a picture of the oops was reported, so I can't
      paste the whole thing here.  But here are the most interesting parts:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle kernel paging request at ffff8804615cfd78
        ...
        RIP: ffff8804615cfd78
        RSP: ffff8804615f0000
        RBP: ffff8804615cfdc0
        ...
        Call Trace:
         do_signal+0x23
         exit_to_usermode_loop+0x64
         ...
      
      The RIP is on the same page as RBP, so it apparently started executing
      on the stack.
      
      The bug was bisected to commit ef0f3ed5 (x86/asm/power: Create
      stack frames in hibernate_asm_64.S), which in retrospect seems quite
      dangerous, since that code saves and restores the stack pointer from a
      global variable ('saved_context').
      
      There are a lot of moving parts in the hibernate save and restore paths,
      so I don't know exactly what caused the panic.  Presumably, a FRAME_END
      was executed without the corresponding FRAME_BEGIN, or vice versa.  That
      would corrupt the return address on the stack and would be consistent
      with the details of the above panic.
      
      [ rjw: One major problem is that by the time the FRAME_BEGIN in
        restore_registers() is executed, the stack pointer value may not
        be valid any more.  Namely, the stack area pointed to by it
        previously may have been overwritten by some image memory contents
        and that page frame may now be used for whatever different purpose
        it had been allocated for before hibernation.  In that case, the
        FRAME_BEGIN will corrupt that memory. ]
      
      Instead of doing the frame pointer save/restore around the bounds of the
      affected functions, just do it around the call to swsusp_save().
      
      That has the same effect of ensuring that if swsusp_save() sleeps, the
      frame pointers will be correct.  It's also a much more obviously safe
      way to do it than the original patch.  And objtool still doesn't report
      any warnings.
      
      Fixes: ef0f3ed5 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Reported-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Tested-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4ce827b4
  8. 22 Jul, 2016 1 commit
    • Chen Yu's avatar
      PM / hibernate: Introduce test_resume mode for hibernation · fe12c00d
      Chen Yu authored
      test_resume mode is to verify if the snapshot data
      written to swap device can be successfully restored
      to memory. It is useful to ease the debugging process
      on hibernation, since this mode can not only bypass
      the BIOSes/bootloader, but also the system re-initialization.
      
      To avoid the risk to break the filesystm on persistent storage,
      this patch resumes the image with tasks frozen.
      
      For example:
      echo test_resume > /sys/power/disk
      echo disk > /sys/power/state
      
      [  187.306470] PM: Image saving progress:  70%
      [  187.395298] PM: Image saving progress:  80%
      [  187.476697] PM: Image saving progress:  90%
      [  187.554641] PM: Image saving done.
      [  187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
      [  187.566000] PM: S|
      [  187.589742] PM: Basic memory bitmaps freed
      [  187.594694] PM: Checking hibernation image
      [  187.599865] PM: Image signature found, resuming
      [  187.605209] PM: Loading hibernation image.
      [  187.665753] PM: Basic memory bitmaps created
      [  187.691397] PM: Using 3 thread(s) for decompression.
      [  187.691397] PM: Loading and decompressing image data (148650 pages)...
      [  187.889719] PM: Image loading progress:   0%
      [  188.100452] PM: Image loading progress:  10%
      [  188.244781] PM: Image loading progress:  20%
      [  189.057305] PM: Image loading done.
      [  189.068793] PM: Image successfully loaded
      Suggested-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fe12c00d
  9. 15 Jul, 2016 1 commit
    • Rafael J. Wysocki's avatar
      x86 / hibernate: Use hlt_play_dead() when resuming from hibernation · 406f992e
      Rafael J. Wysocki authored
      On Intel hardware, native_play_dead() uses mwait_play_dead() by
      default and only falls back to the other methods if that fails.
      That also happens during resume from hibernation, when the restore
      (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
      except for the boot one offline.
      
      However, that is problematic, because the address passed to
      __monitor() in mwait_play_dead() is likely to be written to in the
      last phase of hibernate image restoration and that causes the "dead"
      CPU to start executing instructions again.  Unfortunately, the page
      containing the address in that CPU's instruction pointer may not be
      valid any more at that point.
      
      First, that page may have been overwritten with image kernel memory
      contents already, so the instructions the CPU attempts to execute may
      simply be invalid.  Second, the page tables previously used by that
      CPU may have been overwritten by image kernel memory contents, so the
      address in its instruction pointer is impossible to resolve then.
      
      A report from Varun Koyyalagunta and investigation carried out by
      Chen Yu show that the latter sometimes happens in practice.
      
      To prevent it from happening, temporarily change the smp_ops.play_dead
      pointer during resume from hibernation so that it points to a special
      "play dead" routine which uses hlt_play_dead() and avoids the
      inadvertent "revivals" of "dead" CPUs this way.
      
      A slightly unpleasant consequence of this change is that if the
      system is hibernated with one or more CPUs offline, it will generally
      draw more power after resume than it did before hibernation, because
      the physical state entered by CPUs via hlt_play_dead() is higher-power
      than the mwait_play_dead() one in the majority of cases.  It is
      possible to work around this, but it is unclear how much of a problem
      that's going to be in practice, so the workaround will be implemented
      later if it turns out to be necessary.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371Reported-by: default avatarVarun Koyyalagunta <cpudebug@centtech.com>
      Original-by: default avatarChen Yu <yu.c.chen@intel.com>
      Tested-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      406f992e
  10. 10 Jul, 2016 1 commit
    • Rafael J. Wysocki's avatar
      PM / hibernate: Image data protection during restoration · 4c0b6c10
      Rafael J. Wysocki authored
      Make it possible to protect all pages holding image data during
      hibernate image restoration by setting them read-only (so as to
      catch attempts to write to those pages after image data have been
      stored in them).
      
      This adds overhead to image restoration code (it may cause large
      page mappings to be split as a result of page flags changes) and
      the errors it protects against should never happen in theory, so
      the feature is only active after passing hibernate=protect_image
      to the command line of the restore kernel.
      
      Also it only is built if CONFIG_DEBUG_RODATA is set.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4c0b6c10
  11. 09 Jul, 2016 4 commits
  12. 08 Jul, 2016 1 commit
  13. 01 Jul, 2016 4 commits
    • Rafael J. Wysocki's avatar
      PM / hibernate: Recycle safe pages after image restoration · 307c5971
      Rafael J. Wysocki authored
      One of the memory bitmaps used by the hibernation image restoration
      code is freed after the image has been loaded.
      
      That is not quite efficient, though, because the memory pages used
      for building that bitmap are known to be safe (ie. they were not
      used by the image kernel before hibernation) and the arch-specific
      code finalizing the image restoration may need them.  In that case
      it needs to allocate those pages again via the memory management
      subsystem, check if they are really safe again by consulting the
      other bitmaps and so on.
      
      To avoid that, recycle those pages by putting them into the global
      list of known safe pages so that they can be given to the arch code
      right away when necessary.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      307c5971
    • Rafael J. Wysocki's avatar
      PM / hibernate: Simplify mark_unsafe_pages() · 6dbecfd3
      Rafael J. Wysocki authored
      Rework mark_unsafe_pages() to use a simpler method of clearing
      all bits in free_pages_map and to set the bits for the "unsafe"
      pages (ie. pages that were used by the image kernel before
      hibernation) with the help of duplicate_memory_bitmap().
      
      For this purpose, move the pfn_valid() check from mark_unsafe_pages()
      to unpack_orig_pfns() where the "unsafe" pages are discovered.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dbecfd3
    • Rafael J. Wysocki's avatar
      PM / hibernate: Do not free preallocated safe pages during image restore · 9c744481
      Rafael J. Wysocki authored
      The core image restoration code preallocates some safe pages
      (ie. pages that weren't used by the image kernel before hibernation)
      for future use before allocating the bulk of memory for loading the
      image data.  Those safe pages are then freed so they can be allocated
      again (with the memory management subsystem's help).  That's done to
      ensure that there will be enough safe pages for temporary data
      structures needed during image restoration.
      
      However, it is not really necessary to free those pages after they
      have been allocated.  They can be added to the (global) list of
      safe pages right away and then picked up from there when needed
      without freeing.
      
      That reduces the overhead related to using safe pages, especially
      in the arch-specific code, so modify the code accordingly.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9c744481
    • Roger Lu's avatar
      PM / suspend: show workqueue state in suspend flow · 7b776af6
      Roger Lu authored
      If freezable workqueue aborts suspend flow, show
      workqueue state for debug purpose.
      Signed-off-by: default avatarRoger Lu <roger.lu@mediatek.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7b776af6
  14. 30 Jun, 2016 1 commit
    • Rafael J. Wysocki's avatar
      x86/power/64: Fix kernel text mapping corruption during image restoration · 65c0554b
      Rafael J. Wysocki authored
      Logan Gunthorpe reports that hibernation stopped working reliably for
      him after commit ab76f7b4 (x86/mm: Set NX on gap between __ex_table
      and rodata).
      
      That turns out to be a consequence of a long-standing issue with the
      64-bit image restoration code on x86, which is that the temporary
      page tables set up by it to avoid page tables corruption when the
      last bits of the image kernel's memory contents are copied into
      their original page frames re-use the boot kernel's text mapping,
      but that mapping may very well get corrupted just like any other
      part of the page tables.  Of course, if that happens, the final
      jump to the image kernel's entry point will go to nowhere.
      
      The exact reason why commit ab76f7b4 matters here is that it
      sometimes causes a PMD of a large page to be split into PTEs
      that are allocated dynamically and get corrupted during image
      restoration as described above.
      
      To fix that issue note that the code copying the last bits of the
      image kernel's memory contents to the page frames occupied by them
      previoulsy doesn't use the kernel text mapping, because it runs from
      a special page covered by the identity mapping set up for that code
      from scratch.  Hence, the kernel text mapping is only needed before
      that code starts to run and then it will only be used just for the
      final jump to the image kernel's entry point.
      
      Accordingly, the temporary page tables set up in swsusp_arch_resume()
      on x86-64 need to contain the kernel text mapping too.  That mapping
      is only going to be used for the final jump to the image kernel, so
      it only needs to cover the image kernel's entry point, because the
      first thing the image kernel does after getting control back is to
      switch over to its own original page tables.  Moreover, the virtual
      address of the image kernel's entry point in that mapping has to be
      the same as the one mapped by the image kernel's page tables.
      
      With that in mind, modify the x86-64's arch_hibernation_header_save()
      and arch_hibernation_header_restore() routines to pass the physical
      address of the image kernel's entry point (in addition to its virtual
      address) to the boot kernel (a small piece of assembly code involved
      in passing the entry point's virtual address to the image kernel is
      not necessary any more after that, so drop it).  Update RESTORE_MAGIC
      too to reflect the image header format change.
      
      Next, in set_up_temporary_mappings(), use the physical and virtual
      addresses of the image kernel's entry point passed in the image
      header to set up a minimum kernel text mapping (using memory pages
      that won't be overwritten by the image kernel's memory contents) that
      will map those addresses to each other as appropriate.
      
      This makes the concern about the possible corruption of the original
      boot kernel text mapping go away and if the the minimum kernel text
      mapping used for the final jump marks the image kernel's entry point
      memory as executable, the jump to it is guaraneed to succeed.
      
      Fixes: ab76f7b4 (x86/mm: Set NX on gap between __ex_table and rodata)
      Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      65c0554b
  15. 27 Jun, 2016 2 commits
  16. 26 Jun, 2016 1 commit
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 2ac9b973
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two straightforward fixes.
      
        One is a concurrency issue only affecting SAS connected SATA drives,
        but which could hang the storage subsystem if it triggers (because the
        outstanding command count on error never goes back to zero) and the
        other is a NO_TAG fallout from the switch to hostwide tags which
        causes the system to crash on module insertion (we've checked
        carefully and only the 53c700 family of drivers is vulnerable to this
        issue)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        53c700: fix BUG on untagged commands
        scsi: fix race between simultaneous decrements of ->host_failed
      2ac9b973
  17. 25 Jun, 2016 17 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.7-part2' of... · da2f6aba
      Linus Torvalds authored
      Merge branch 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
      
      Pull btrfs fixes part 2 from Chris Mason:
       "This has one patch from Omar to bring iterate_shared back to btrfs.
      
        We have a tree of work we queue up for directory items and it doesn't
        lend itself well to shared access.  While we're cleaning it up, Omar
        has changed things to use an exclusive lock when there are delayed
        items"
      
      * 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
      da2f6aba
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · b971712a
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "I have a two part pull this time because one of the patches Dave
        Sterba collected needed to be against v4.7-rc2 or higher (we used
        rc4).  I try to make my for-linus-xx branch testable on top of the
        last major so we can hand fixes to people on the list more easily, so
        I've split this pull in two.
      
        This first part has some fixes and two performance improvements that
        we've been testing for some time.
      
        Josef's two performance fixes are most notable.  The transid tracking
        patch makes a big improvement on pretty much every workload"
      
      * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: Force stripesize to the value of sectorsize
        btrfs: fix disk_i_size update bug when fallocate() fails
        Btrfs: fix error handling in map_private_extent_buffer
        Btrfs: fix error return code in btrfs_init_test_fs()
        Btrfs: don't do nocow check unless we have to
        btrfs: fix deadlock in delayed_ref_async_start
        Btrfs: track transid for delayed ref flushing
      b971712a
    • Linus Torvalds's avatar
      Merge tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ca83a55c
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Again pretty calm weeks: we've had only a few trivial / stable
        HD-audio fixes in addition to a possible race fix for snd-dummy driver
        spotted by syzkaller"
      
      * tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: dummy: Fix a use-after-free at closing
        ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
        ALSA: hda - Fix the headset mic jack detection on Dell machine
        ALSA: hda/tegra: iomem fixups for sparse warnings
        ALSA: hdac_regmap - fix the register access for runtime PM
      ca83a55c
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9a949a98
      Linus Torvalds authored
      Pull x86 kprobe fix from Thomas Gleixner:
       "A single fix clearing the TF bit when a fault is single stepped"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kprobes/x86: Clear TF bit in fault on single-stepping
      9a949a98
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 57801c1b
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A couple of scheduler fixes:
      
         - force watchdog reset while processing sysrq-w
      
         - fix a deadlock when enabling trace events in the scheduler
      
         - fixes to the throttled next buddy logic
      
         - fixes for the average accounting (missing serialization and
           underflow handling)
      
         - allow kernel threads for fallback to online but not active cpus"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Allow kthreads to fall back to online && !active cpus
        sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
        sched/fair: Initialize throttle_count for new task-groups lazily
        sched/fair: Fix cfs_rq avg tracking underflow
        kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
        sched/debug: Fix deadlock when enabling sched events
        sched/fair: Fix post_init_entity_util_avg() serialization
      57801c1b
    • Omar Sandoval's avatar
      Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes · 02dbfc99
      Omar Sandoval authored
      Commit fe742fd4 ("Revert "btrfs: switch to ->iterate_shared()"")
      backed out the conversion to ->iterate_shared() for Btrfs because the
      delayed inode handling in btrfs_real_readdir() is racy. However, we can
      still do readdir in parallel if there are no delayed nodes.
      
      This is a temporary fix which upgrades the shared inode lock to an
      exclusive lock only when we have delayed items until we come up with a
      more complete solution. While we're here, rename the
      btrfs_{get,put}_delayed_items functions to make it very clear that
      they're just for readdir.
      
      Tested with xfstests and by doing a parallel kernel build:
      
      	while make tinyconfig && make -j4 && git clean dqfx; do
      		:
      	done
      
      along with a bunch of parallel finds in another shell:
      
      	while true; do
      		for ((i=0; i<4; i++)); do
      			find . >/dev/null &
      		done
      		wait
      	done
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      02dbfc99
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e3b22bc3
      Linus Torvalds authored
      Pull locking fix from Thomas Gleixner:
       "A single fix to address a race in the static key logic"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/static_key: Fix concurrent static_key_slow_inc()
      e3b22bc3
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2de23071
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A single fix for the fallout from the conversion of MIPS GIC to irq
        domains"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mips-gic: Fix IRQs in gic_dev_domain
      2de23071
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 2f6e9747
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "mm/radix (Aneesh Kumar K.V):
         - Update to tlb functions ric argument
         - Flush page walk cache when freeing page table
         - Update Radix tree size as per ISA 3.0
      
        mm/hash (Aneesh Kumar K.V):
         - Use the correct PPP mask when updating HPTE
         - Don't add memory coherence if cache inhibited is set
      
        eeh (Gavin Shan):
         - Fix invalid cached PE primary bus
      
        bpf/jit (Naveen N. Rao):
         - Disable classic BPF JIT on ppc64le
      
        .. and fix faults caused by radix patching of SLB miss handler
        (Michael Ellerman)"
      
      * tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
        powerpc: Fix faults caused by radix patching of SLB miss handler
        powerpc/eeh: Fix invalid cached PE primary bus
        powerpc/mm/radix: Update Radix tree size as per ISA 3.0
        powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
        powerpc/mm/hash: Use the correct PPP mask when updating HPTE
        powerpc/mm/radix: Flush page walk cache when freeing page table
        powerpc/mm/radix: Update to tlb functions ric argument
      2f6e9747
    • Michael Ellerman's avatar
      Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE · 9521d399
      Michael Ellerman authored
      Commit b235beea ("Clarify naming of thread info/stack allocators")
      breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
      
        kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
        kernel/fork.c:355:8: error: assignment from incompatible pointer type
          stack = alloc_thread_stack_node(tsk, node);
          ^
      
      Fix it by renaming free_stack() to free_thread_stack(), and updating the
      return type of alloc_thread_stack_node().
      
      Fixes: b235beea ("Clarify naming of thread info/stack allocators")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9521d399
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 086e3eb6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Two weeks worth of fixes here"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
        init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
        autofs: don't get stuck in a loop if vfs_write() returns an error
        mm/page_owner: avoid null pointer dereference
        tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
        fs/nilfs2: fix potential underflow in call to crc32_le
        oom, suspend: fix oom_reaper vs. oom_killer_disable race
        ocfs2: disable BUG assertions in reading blocks
        mm, compaction: abort free scanner if split fails
        mm: prevent KASAN false positives in kmemleak
        mm/hugetlb: clear compound_mapcount when freeing gigantic pages
        mm/swap.c: flush lru pvecs on compound page arrival
        memcg: css_alloc should return an ERR_PTR value on error
        memcg: mem_cgroup_migrate() may be called with irq disabled
        hugetlb: fix nr_pmds accounting with shared page tables
        Revert "mm: disable fault around on emulated access bit architecture"
        Revert "mm: make faultaround produce old ptes"
        mailmap: add Boris Brezillon's email
        mailmap: add Antoine Tenart's email
        mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
        mm: mempool: kasan: don't poot mempool objects in quarantine
        ...
      086e3eb6
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · aebe9bb8
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "This is the second batch of queued up rdma patches for this rc cycle.
      
        There isn't anything really major in here.  It's passed 0day,
        linux-next, and local testing across a wide variety of hardware.
        There are still a few known issues to be tracked down, but this should
        amount to the vast majority of the rdma RC fixes.
      
        Round two of 4.7 rc fixes:
      
         - A couple minor fixes to the rdma core
         - Multiple minor fixes to hfi1
         - Multiple minor fixes to mlx4/mlx4
         - A few minor fixes to i40iw"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (31 commits)
        IB/srpt: Reduce QP buffer size
        i40iw: Enable level-1 PBL for fast memory registration
        i40iw: Return correct max_fast_reg_page_list_len
        i40iw: Correct status check on i40iw_get_pble
        i40iw: Correct CQ arming
        IB/rdmavt: Correct qp_priv_alloc() return value test
        IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
        IB/hfi1: Fix deadlock with txreq allocation slow path
        IB/mlx4: Prevent cross page boundary allocation
        IB/mlx4: Fix memory leak if QP creation failed
        IB/mlx4: Verify port number in flow steering create flow
        IB/mlx4: Fix error flow when sending mads under SRIOV
        IB/mlx4: Fix the SQ size of an RC QP
        IB/mlx5: Fix wrong naming of port_rcv_data counter
        IB/mlx5: Fix post send fence logic
        IB/uverbs: Initialize ib_qp_init_attr with zeros
        IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
        IB/core: Fix RoCE v1 multicast join logic issue
        IB/core: Fix no default GIDs when netdevice reregisters
        IB/hfi1: Send a pkey change event on driver pkey update
        ...
      aebe9bb8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 3fb5e59c
      Linus Torvalds authored
      Pull HID fix from Jiri Kosina:
       "hiddev ioctl() validation fix from Scott Bauer"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
      3fb5e59c
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.7-rc5' of... · 260eaba4
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fix from Guenter Roeck:
       "Improve fan type detection for dell-smm to prevent kernel hang"
      
      * tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (dell-smm) Cache fan_type() calls and change fan detection
      260eaba4
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ed13fbbf
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Stable-candidate fix for a deadlock in ACPICA introduced during the
        4.5 development cycle by a commit attempting to improve the handling
        of AML code that doesn't belong to any namespace objects in a given
        definition block (Lv Zheng)"
      
      * tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
      ed13fbbf
    • Linus Torvalds's avatar
      Merge tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3522b35c
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Fix for a latent cpufreq driver bug uncovered by a recent ACPICA
        change and several fixes for the devfreq framework, including one fix
        for an issue introduced recently.
      
        Specifics:
      
         - Fix a latent initialization issue in the pcc-cpufreq driver
           (incorrect initial value of a structure field) that has been
           uncovered by a recent ACPICA commit (Mike Galbraith).
      
         - Add a missing notification in an update_devfreq() error code path
           forgotten by a recent devfreq commit (Chanwoo Choi).
      
         - Fix devfreq device frequency initialization (Lukasz Luba).
      
         - Fix an incorrect IS_ERR() check in the devfreq framework discovered
           by the Smatch checker (Dan Carpenter).
      
         - Drop two excessive put_device() calls from the devfreq framework
           (MyungJoo Ham, Cai Zhiyong).
      
         - Fix a possible memory leak in the devfreq framework and drop an
           unnecessary kfree() invocation from it (MyungJoo Ham)"
      
      * tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
        cpufreq: pcc-cpufreq: Fix doorbell.access_width
        PM / devfreq: fix initialization of current frequency in last status
        PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
        PM / devfreq: remove double put_device
        PM / devfreq: fix double call put_device
        PM / devfreq: fix duplicated kfree on devfreq pointer
        PM / devfreq: devm_kzalloc to have dev pointer more precisely
      3522b35c
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 032fd3e5
      Linus Torvalds authored
      Pull xen bug fixes from David Vrabel:
      
       - fix x86 PV dom0 crash during early boot on some hardware
      
       - fix two pciback bugs affects certain devices
      
       - fix potential overflow when clearing page tables in x86 PV
      
      * tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen-pciback: return proper values during BAR sizing
        x86/xen: avoid m2p lookup when setting early page table entries
        xen/pciback: Fix conf_space read/write overlap check.
        x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
        xen/balloon: Fix declared-but-not-defined warning
      032fd3e5