• Abhishek Sahu's avatar
    vfio/pci: Move the unused device into low power state with runtime PM · 7ab5e10e
    Abhishek Sahu authored
    Currently, there is very limited power management support
    available in the upstream vfio_pci_core based drivers. If there
    are no users of the device, then the PCI device will be moved into
    D3hot state by writing directly into PCI PM registers. This D3hot
    state help in saving power but we can achieve zero power consumption
    if we go into the D3cold state. The D3cold state cannot be possible
    with native PCI PM. It requires interaction with platform firmware
    which is system-specific. To go into low power states (including D3cold),
    the runtime PM framework can be used which internally interacts with PCI
    and platform firmware and puts the device into the lowest possible
    D-States.
    
    This patch registers vfio_pci_core based drivers with the
    runtime PM framework.
    
    1. The PCI core framework takes care of most of the runtime PM
       related things. For enabling the runtime PM, the PCI driver needs to
       decrement the usage count and needs to provide 'struct dev_pm_ops'
       at least. The runtime suspend/resume callbacks are optional and needed
       only if we need to do any extra handling. Now there are multiple
       vfio_pci_core based drivers. Instead of assigning the
       'struct dev_pm_ops' in individual parent driver, the vfio_pci_core
       itself assigns the 'struct dev_pm_ops'. There are other drivers where
       the 'struct dev_pm_ops' is being assigned inside core layer
       (For example, wlcore_probe() and some sound based driver, etc.).
    
    2. This patch provides the stub implementation of 'struct dev_pm_ops'.
       The subsequent patch will provide the runtime suspend/resume
       callbacks. All the config state saving, and PCI power management
       related things will be done by PCI core framework itself inside its
       runtime suspend/resume callbacks (pci_pm_runtime_suspend() and
       pci_pm_runtime_resume()).
    
    3. Inside pci_reset_bus(), all the devices in dev_set needs to be
       runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take
       care of the runtime resume and its error handling.
    
    4. Inside vfio_pci_core_disable(), the device usage count always needs
       to be decremented which was incremented in vfio_pci_core_enable().
    
    5. Since the runtime PM framework will provide the same functionality,
       so directly writing into PCI PM config register can be replaced with
       the use of runtime PM routines. Also, the use of runtime PM can help
       us in more power saving.
    
       In the systems which do not support D3cold,
    
       With the existing implementation:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D0
    
       With runtime PM:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D3hot
    
       So, with runtime PM, the upstream bridge or root port will also go
       into lower power state which is not possible with existing
       implementation.
    
       In the systems which support D3cold,
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D0
    
       With runtime PM:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3cold
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D3cold
    
       So, with runtime PM, both the PCI device and upstream bridge will
       go into D3cold state.
    
    6. If 'disable_idle_d3' module parameter is set, then also the runtime
       PM will be enabled, but in this case, the usage count should not be
       decremented.
    
    7. vfio_pci_dev_set_try_reset() return value is unused now, so this
       function return type can be changed to void.
    
    8. Use the runtime PM API's in vfio_pci_core_sriov_configure().
       The device can be in low power state either with runtime
       power management (when there is no user) or PCI_PM_CTRL register
       write by the user. In both the cases, the PF should be moved to
       D0 state. For preventing any runtime usage mismatch, pci_num_vf()
       has been called explicitly during disable.
    Signed-off-by: default avatarAbhishek Sahu <abhsahu@nvidia.com>
    Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.comSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    7ab5e10e
vfio_pci_core.c 59.6 KB