Documentation: PM: sleep: Document system-wide suspend code flows

Add a document describing high-level system-wide suspend code flows in Linux. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Documentation: PM: sleep: Document system-wide suspend code flows
Add a document describing high-level system-wide suspend code flows in Linux. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4506c531 · Rafael J. Wysocki · db96a759 · 4506c531 · 4506c531
Commit 4506c531 authored Apr 02, 2020 by Rafael J. Wysocki
Showing with 271 additions and 0 deletions

Documentation/admin-guide/pm/suspend-flows.rst Documentation/admin-guide/pm/suspend-flows.rst +270 -0

Documentation/admin-guide/pm/system-wide.rst Documentation/admin-guide/pm/system-wide.rst +1 -0

No files found.
--- a/Documentation/admin-guide/pm/suspend-flows.rst
+++ b/Documentation/admin-guide/pm/suspend-flows.rst
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+=========================
+System Suspend Code Flows
+=========================
+:Copyright: |copy| 2020 Intel Corporation
+:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+At least one global system-wide transition needs to be carried out for the
+system to get from the working state into one of the supported
+:doc:`sleep states <sleep-states>`.  Hibernation requires more than one
+transition to occur for this purpose, but the other sleep states, commonly
+referred to as *system-wide suspend* (or simply *system suspend*) states, need
+only one.
+For those sleep states, the transition from the working state of the system into
+the target sleep state is referred to as *system suspend* too (in the majority
+of cases, whether this means a transition or a sleep state of the system should
+be clear from the context) and the transition back from the sleep state into the
+working state is referred to as *system resume*.
+The kernel code flows associated with the suspend and resume transitions for
+different sleep states of the system are quite similar, but there are some
+significant differences between the :ref:`suspend-to-idle <s2idle>` code flows
+and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and
+:ref:`standby <standby>` sleep states.
+The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states
+cannot be implemented without platform support and the difference between them
+boils down to the platform-specific actions carried out by the suspend and
+resume hooks that need to be provided by the platform driver to make them
+available.  Apart from that, the suspend and resume code flows for these sleep
+states are mostly identical, so they both together will be referred to as
+*platform-dependent suspend* states in what follows.
+.. _s2idle_suspend:
+Suspend-to-idle Suspend Code Flow
+=================================
+The following steps are taken in order to transition the system from the working
+state to the :ref:`suspend-to-idle <s2idle>` sleep state:
+ 1. Invoking system-wide suspend notifiers.
+    Kernel subsystems can register callbacks to be invoked when the suspend
+    transition is about to occur and when the resume transition has finished.
+    That allows them to prepare for the change of the system state and to clean
+    up after getting back to the working state.
+ 2. Freezing tasks.
+    Tasks are frozen primarily in order to avoid unchecked hardware accesses
+    from user space through MMIO regions or I/O registers exposed directly to
+    it and to prevent user space from entering the kernel while the next step
+    of the transition is in progress (which might have been problematic for
+    various reasons).
+    All user space tasks are intercepted as though they were sent a signal and
+    put into uninterruptible sleep until the end of the subsequent system resume
+    transition.
+    The kernel threads that choose to be frozen during system suspend for
+    specific reasons are frozen subsequently, but they are not intercepted.
+    Instead, they are expected to periodically check whether or not they need
+    to be frozen and to put themselves into uninterruptible sleep if so.  [Note,
+    however, that kernel threads can use locking and other concurrency controls
+    available in kernel space to synchronize themselves with system suspend and
+    resume, which can be much more precise than the freezing, so the latter is
+    not a recommended option for kernel threads.]
+ 3. Suspending devices and reconfiguring IRQs.
+    Devices are suspended in four phases called *prepare*, *suspend*,
+    *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more
+    information on what exactly happens in each phase).
+    Every device is visited in each phase, but typically it is not physically
+    accessed in more than two of them.
+    The runtime PM API is disabled for every device during the *late* suspend
+    phase and high-level ("action") interrupt handlers are prevented from being
+    invoked before the *noirq* suspend phase.
+    Interrupts are still handled after that, but they are only acknowledged to
+    interrupt controllers without performing any device-specific actions that
+    would be triggered in the working state of the system (those actions are
+    deferred till the subsequent system resume transition as described
+    `below <s2idle_resume_>`_).
+    IRQs associated with system wakeup devices are "armed" so that the resume
+    transition of the system is started when one of them signals an event.
+ 4. Freezing the scheduler tick and suspending timekeeping.
+    When all devices have been suspended, CPUs enter the idle loop and are put
+    into the deepest available idle state.  While doing that, each of them
+    "freezes" its own scheduler tick so that the timer events associated with
+    the tick do not occur until the CPU is woken up by another interrupt source.
+    The last CPU to enter the idle state also stops the timekeeping which
+    (among other things) prevents high resolution timers from triggering going
+    forward until the first CPU that is woken up restarts the timekeeping.
+    That allows the CPUs to stay in the deep idle state relatively long in one
+    go.
+    From this point on, the CPUs can only be woken up by non-timer hardware
+    interrupts.  If that happens, they go back to the idle state unless the
+    interrupt that woke up one of them comes from an IRQ that has been armed for
+    system wakeup, in which case the system resume transition is started.
+.. _s2idle_resume:
+Suspend-to-idle Resume Code Flow
+================================
+The following steps are taken in order to transition the system from the
+:ref:`suspend-to-idle <s2idle>` sleep state into the working state:
+ 1. Resuming timekeeping and unfreezing the scheduler tick.
+    When one of the CPUs is woken up (by a non-timer hardware interrupt), it
+    leaves the idle state entered in the last step of the preceding suspend
+    transition, restarts the timekeeping (unless it has been restarted already
+    by another CPU that woke up earlier) and the scheduler tick on that CPU is
+    unfrozen.
+    If the interrupt that has woken up the CPU was armed for system wakeup,
+    the system resume transition begins.
+ 2. Resuming devices and restoring the working-state configuration of IRQs.
+    Devices are resumed in four phases called *noirq resume*, *early resume*,
+    *resume* and *complete* (see :ref:`driverapi_pm_devices` for more
+    information on what exactly happens in each phase).
+    Every device is visited in each phase, but typically it is not physically
+    accessed in more than two of them.
+    The working-state configuration of IRQs is restored after the *noirq* resume
+    phase and the runtime PM API is re-enabled for every device whose driver
+    supports it during the *early* resume phase.
+ 3. Thawing tasks.
+    Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_
+    transition are "thawed", which means that they are woken up from the
+    uninterruptible sleep that they went into at that time and user space tasks
+    are allowed to exit the kernel.
+ 4. Invoking system-wide resume notifiers.
+    This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition
+    and the same set of callbacks is invoked at this point, but a different
+    "notification type" parameter value is passed to them.
+Platform-dependent Suspend Code Flow
+====================================
+The following steps are taken in order to transition the system from the working
+state to platform-dependent suspend state:
+ 1. Invoking system-wide suspend notifiers.
+    This step is the same as step 1 of the suspend-to-idle suspend transition
+    described `above <s2idle_suspend_>`_.
+ 2. Freezing tasks.
+    This step is the same as step 2 of the suspend-to-idle suspend transition
+    described `above <s2idle_suspend_>`_.
+ 3. Suspending devices and reconfiguring IRQs.
+    This step is analogous to step 3 of the suspend-to-idle suspend transition
+    described `above <s2idle_suspend_>`_, but the arming of IRQs for system
+    wakeup generally does not have any effect on the platform.
+    There are platforms that can go into a very deep low-power state internally
+    when all CPUs in them are in sufficiently deep idle states and all I/O
+    devices have been put into low-power states.  On those platforms,
+    suspend-to-idle can reduce system power very effectively.
+    On the other platforms, however, low-level components (like interrupt
+    controllers) need to be turned off in a platform-specific way (implemented
+    in the hooks provided by the platform driver) to achieve comparable power
+    reduction.
+    That usually prevents in-band hardware interrupts from waking up the system,
+    which must be done in a special platform-dependent way.  Then, the
+    configuration of system wakeup sources usually starts when system wakeup
+    devices are suspended and is finalized by the platform suspend hooks later
+    on.
+ 4. Disabling non-boot CPUs.
+    On some platforms the suspend hooks mentioned above must run in a one-CPU
+    configuration of the system (in particular, the hardware cannot be accessed
+    by any code running in parallel with the platform suspend hooks that may,
+    and often do, trap into the platform firmware in order to finalize the
+    suspend transition).
+    For this reason, the CPU offline/online (CPU hotplug) framework is used
+    to take all of the CPUs in the system, except for one (the boot CPU),
+    offline (typically, the CPUs that have been taken offline go into deep idle
+    states).
+    This means that all tasks are migrated away from those CPUs and all IRQs are
+    rerouted to the only CPU that remains online.
+ 5. Suspending core system components.
+    This prepares the core system components for (possibly) losing power going
+    forward and suspends the timekeeping.
+ 6. Platform-specific power removal.
+    This is expected to remove power from all of the system components except
+    for the memory controller and RAM (in order to preserve the contents of the
+    latter) and some devices designated for system wakeup.
+    In many cases control is passed to the platform firmware which is expected
+    to finalize the suspend transition as needed.
+Platform-dependent Resume Code Flow
+===================================
+The following steps are taken in order to transition the system from a
+platform-dependent suspend state into the working state:
+ 1. Platform-specific system wakeup.
+    The platform is woken up by a signal from one of the designated system
+    wakeup devices (which need not be an in-band hardware interrupt)  and
+    control is passed back to the kernel (the working configuration of the
+    platform may need to be restored by the platform firmware before the
+    kernel gets control again).
+ 2. Resuming core system components.
+    The suspend-time configuration of the core system components is restored and
+    the timekeeping is resumed.
+ 3. Re-enabling non-boot CPUs.
+    The CPUs disabled in step 4 of the preceding suspend transition are taken
+    back online and their suspend-time configuration is restored.
+ 4. Resuming devices and restoring the working-state configuration of IRQs.
+    This step is the same as step 2 of the suspend-to-idle suspend transition
+    described `above <s2idle_resume_>`_.
+ 5. Thawing tasks.
+    This step is the same as step 3 of the suspend-to-idle suspend transition
+    described `above <s2idle_resume_>`_.
+ 6. Invoking system-wide resume notifiers.
+    This step is the same as step 4 of the suspend-to-idle suspend transition
+    described `above <s2idle_resume_>`_.
--- a/Documentation/admin-guide/pm/system-wide.rst
+++ b/Documentation/admin-guide/pm/system-wide.rst
@@ -8,3 +8,4 @@ System-Wide Power Management
   :maxdepth: 2
   sleep-states
+   suspend-flows