1. 22 Jul, 2016 1 commit
    • Chen Yu's avatar
      PM / hibernate: Introduce test_resume mode for hibernation · fe12c00d
      Chen Yu authored
      test_resume mode is to verify if the snapshot data
      written to swap device can be successfully restored
      to memory. It is useful to ease the debugging process
      on hibernation, since this mode can not only bypass
      the BIOSes/bootloader, but also the system re-initialization.
      
      To avoid the risk to break the filesystm on persistent storage,
      this patch resumes the image with tasks frozen.
      
      For example:
      echo test_resume > /sys/power/disk
      echo disk > /sys/power/state
      
      [  187.306470] PM: Image saving progress:  70%
      [  187.395298] PM: Image saving progress:  80%
      [  187.476697] PM: Image saving progress:  90%
      [  187.554641] PM: Image saving done.
      [  187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
      [  187.566000] PM: S|
      [  187.589742] PM: Basic memory bitmaps freed
      [  187.594694] PM: Checking hibernation image
      [  187.599865] PM: Image signature found, resuming
      [  187.605209] PM: Loading hibernation image.
      [  187.665753] PM: Basic memory bitmaps created
      [  187.691397] PM: Using 3 thread(s) for decompression.
      [  187.691397] PM: Loading and decompressing image data (148650 pages)...
      [  187.889719] PM: Image loading progress:   0%
      [  188.100452] PM: Image loading progress:  10%
      [  188.244781] PM: Image loading progress:  20%
      [  189.057305] PM: Image loading done.
      [  189.068793] PM: Image successfully loaded
      Suggested-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fe12c00d
  2. 15 Jul, 2016 1 commit
    • Rafael J. Wysocki's avatar
      x86 / hibernate: Use hlt_play_dead() when resuming from hibernation · 406f992e
      Rafael J. Wysocki authored
      On Intel hardware, native_play_dead() uses mwait_play_dead() by
      default and only falls back to the other methods if that fails.
      That also happens during resume from hibernation, when the restore
      (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
      except for the boot one offline.
      
      However, that is problematic, because the address passed to
      __monitor() in mwait_play_dead() is likely to be written to in the
      last phase of hibernate image restoration and that causes the "dead"
      CPU to start executing instructions again.  Unfortunately, the page
      containing the address in that CPU's instruction pointer may not be
      valid any more at that point.
      
      First, that page may have been overwritten with image kernel memory
      contents already, so the instructions the CPU attempts to execute may
      simply be invalid.  Second, the page tables previously used by that
      CPU may have been overwritten by image kernel memory contents, so the
      address in its instruction pointer is impossible to resolve then.
      
      A report from Varun Koyyalagunta and investigation carried out by
      Chen Yu show that the latter sometimes happens in practice.
      
      To prevent it from happening, temporarily change the smp_ops.play_dead
      pointer during resume from hibernation so that it points to a special
      "play dead" routine which uses hlt_play_dead() and avoids the
      inadvertent "revivals" of "dead" CPUs this way.
      
      A slightly unpleasant consequence of this change is that if the
      system is hibernated with one or more CPUs offline, it will generally
      draw more power after resume than it did before hibernation, because
      the physical state entered by CPUs via hlt_play_dead() is higher-power
      than the mwait_play_dead() one in the majority of cases.  It is
      possible to work around this, but it is unclear how much of a problem
      that's going to be in practice, so the workaround will be implemented
      later if it turns out to be necessary.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371Reported-by: default avatarVarun Koyyalagunta <cpudebug@centtech.com>
      Original-by: default avatarChen Yu <yu.c.chen@intel.com>
      Tested-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      406f992e
  3. 10 Jul, 2016 1 commit
    • Rafael J. Wysocki's avatar
      PM / hibernate: Image data protection during restoration · 4c0b6c10
      Rafael J. Wysocki authored
      Make it possible to protect all pages holding image data during
      hibernate image restoration by setting them read-only (so as to
      catch attempts to write to those pages after image data have been
      stored in them).
      
      This adds overhead to image restoration code (it may cause large
      page mappings to be split as a result of page flags changes) and
      the errors it protects against should never happen in theory, so
      the feature is only active after passing hibernate=protect_image
      to the command line of the restore kernel.
      
      Also it only is built if CONFIG_DEBUG_RODATA is set.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4c0b6c10
  4. 09 Jul, 2016 4 commits
  5. 08 Jul, 2016 1 commit
  6. 01 Jul, 2016 4 commits
    • Rafael J. Wysocki's avatar
      PM / hibernate: Recycle safe pages after image restoration · 307c5971
      Rafael J. Wysocki authored
      One of the memory bitmaps used by the hibernation image restoration
      code is freed after the image has been loaded.
      
      That is not quite efficient, though, because the memory pages used
      for building that bitmap are known to be safe (ie. they were not
      used by the image kernel before hibernation) and the arch-specific
      code finalizing the image restoration may need them.  In that case
      it needs to allocate those pages again via the memory management
      subsystem, check if they are really safe again by consulting the
      other bitmaps and so on.
      
      To avoid that, recycle those pages by putting them into the global
      list of known safe pages so that they can be given to the arch code
      right away when necessary.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      307c5971
    • Rafael J. Wysocki's avatar
      PM / hibernate: Simplify mark_unsafe_pages() · 6dbecfd3
      Rafael J. Wysocki authored
      Rework mark_unsafe_pages() to use a simpler method of clearing
      all bits in free_pages_map and to set the bits for the "unsafe"
      pages (ie. pages that were used by the image kernel before
      hibernation) with the help of duplicate_memory_bitmap().
      
      For this purpose, move the pfn_valid() check from mark_unsafe_pages()
      to unpack_orig_pfns() where the "unsafe" pages are discovered.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dbecfd3
    • Rafael J. Wysocki's avatar
      PM / hibernate: Do not free preallocated safe pages during image restore · 9c744481
      Rafael J. Wysocki authored
      The core image restoration code preallocates some safe pages
      (ie. pages that weren't used by the image kernel before hibernation)
      for future use before allocating the bulk of memory for loading the
      image data.  Those safe pages are then freed so they can be allocated
      again (with the memory management subsystem's help).  That's done to
      ensure that there will be enough safe pages for temporary data
      structures needed during image restoration.
      
      However, it is not really necessary to free those pages after they
      have been allocated.  They can be added to the (global) list of
      safe pages right away and then picked up from there when needed
      without freeing.
      
      That reduces the overhead related to using safe pages, especially
      in the arch-specific code, so modify the code accordingly.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9c744481
    • Roger Lu's avatar
      PM / suspend: show workqueue state in suspend flow · 7b776af6
      Roger Lu authored
      If freezable workqueue aborts suspend flow, show
      workqueue state for debug purpose.
      Signed-off-by: default avatarRoger Lu <roger.lu@mediatek.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7b776af6
  7. 30 Jun, 2016 1 commit
    • Rafael J. Wysocki's avatar
      x86/power/64: Fix kernel text mapping corruption during image restoration · 65c0554b
      Rafael J. Wysocki authored
      Logan Gunthorpe reports that hibernation stopped working reliably for
      him after commit ab76f7b4 (x86/mm: Set NX on gap between __ex_table
      and rodata).
      
      That turns out to be a consequence of a long-standing issue with the
      64-bit image restoration code on x86, which is that the temporary
      page tables set up by it to avoid page tables corruption when the
      last bits of the image kernel's memory contents are copied into
      their original page frames re-use the boot kernel's text mapping,
      but that mapping may very well get corrupted just like any other
      part of the page tables.  Of course, if that happens, the final
      jump to the image kernel's entry point will go to nowhere.
      
      The exact reason why commit ab76f7b4 matters here is that it
      sometimes causes a PMD of a large page to be split into PTEs
      that are allocated dynamically and get corrupted during image
      restoration as described above.
      
      To fix that issue note that the code copying the last bits of the
      image kernel's memory contents to the page frames occupied by them
      previoulsy doesn't use the kernel text mapping, because it runs from
      a special page covered by the identity mapping set up for that code
      from scratch.  Hence, the kernel text mapping is only needed before
      that code starts to run and then it will only be used just for the
      final jump to the image kernel's entry point.
      
      Accordingly, the temporary page tables set up in swsusp_arch_resume()
      on x86-64 need to contain the kernel text mapping too.  That mapping
      is only going to be used for the final jump to the image kernel, so
      it only needs to cover the image kernel's entry point, because the
      first thing the image kernel does after getting control back is to
      switch over to its own original page tables.  Moreover, the virtual
      address of the image kernel's entry point in that mapping has to be
      the same as the one mapped by the image kernel's page tables.
      
      With that in mind, modify the x86-64's arch_hibernation_header_save()
      and arch_hibernation_header_restore() routines to pass the physical
      address of the image kernel's entry point (in addition to its virtual
      address) to the boot kernel (a small piece of assembly code involved
      in passing the entry point's virtual address to the image kernel is
      not necessary any more after that, so drop it).  Update RESTORE_MAGIC
      too to reflect the image header format change.
      
      Next, in set_up_temporary_mappings(), use the physical and virtual
      addresses of the image kernel's entry point passed in the image
      header to set up a minimum kernel text mapping (using memory pages
      that won't be overwritten by the image kernel's memory contents) that
      will map those addresses to each other as appropriate.
      
      This makes the concern about the possible corruption of the original
      boot kernel text mapping go away and if the the minimum kernel text
      mapping used for the final jump marks the image kernel's entry point
      memory as executable, the jump to it is guaraneed to succeed.
      
      Fixes: ab76f7b4 (x86/mm: Set NX on gap between __ex_table and rodata)
      Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      65c0554b
  8. 27 Jun, 2016 2 commits
  9. 26 Jun, 2016 1 commit
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 2ac9b973
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two straightforward fixes.
      
        One is a concurrency issue only affecting SAS connected SATA drives,
        but which could hang the storage subsystem if it triggers (because the
        outstanding command count on error never goes back to zero) and the
        other is a NO_TAG fallout from the switch to hostwide tags which
        causes the system to crash on module insertion (we've checked
        carefully and only the 53c700 family of drivers is vulnerable to this
        issue)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        53c700: fix BUG on untagged commands
        scsi: fix race between simultaneous decrements of ->host_failed
      2ac9b973
  10. 25 Jun, 2016 24 commits