Commit 8e5c6995 authored by Abhishek Sahu's avatar Abhishek Sahu Committed by Alex Williamson

vfio: Increment the runtime PM usage count during IOCTL call

The vfio-pci based drivers will have runtime power management
support where the user can put the device into the low power state
and then PCI devices can go into the D3cold state. If the device is
in the low power state and the user issues any IOCTL, then the
device should be moved out of the low power state first. Once
the IOCTL is serviced, then it can go into the low power state again.
The runtime PM framework manages this with help of usage count.

One option was to add the runtime PM related API's inside vfio-pci
driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a
different path and more IOCTL can be added in the future. Also, the
runtime PM will be added for vfio-pci based drivers variant currently,
but the other VFIO based drivers can use the same in the
future. So, this patch adds the runtime calls runtime-related API in
the top-level IOCTL function itself.

For the VFIO drivers which do not have runtime power management
support currently, the runtime PM API's won't be invoked. Only for
vfio-pci based drivers currently, the runtime PM API's will be invoked
to increment and decrement the usage count. In the vfio-pci drivers also,
the variant drivers can opt-out by incrementing the usage count during
device-open. The pm_runtime_resume_and_get() checks the device
current status and will return early if the device is already in the
ACTIVE state.

Taking this usage count incremented while servicing IOCTL will make
sure that the user won't put the device into the low power state when any
other IOCTL is being serviced in parallel. Let's consider the
following scenario:

 1. Some other IOCTL is called.
 2. The user has opened another device instance and called the IOCTL for
    low power entry.
 3. The low power entry IOCTL moves the device into the low power state.
 4. The other IOCTL finishes.

If we don't keep the usage count incremented then the device
access will happen between step 3 and 4 while the device has already
gone into the low power state.

The pm_runtime_resume_and_get() will be the first call so its error
should not be propagated to user space directly. For example, if
pm_runtime_resume_and_get() can return -EINVAL for the cases where the
user has passed the correct argument. So the
pm_runtime_resume_and_get() errors have been masked behind -EIO.
Signed-off-by: default avatarAbhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-3-abhsahu@nvidia.comSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
parent 385ecfdf
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
#include <linux/vfio.h> #include <linux/vfio.h>
#include <linux/wait.h> #include <linux/wait.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/pm_runtime.h>
#include "vfio.h" #include "vfio.h"
#define DRIVER_VERSION "0.3" #define DRIVER_VERSION "0.3"
...@@ -1353,6 +1354,39 @@ static const struct file_operations vfio_group_fops = { ...@@ -1353,6 +1354,39 @@ static const struct file_operations vfio_group_fops = {
.release = vfio_group_fops_release, .release = vfio_group_fops_release,
}; };
/*
* Wrapper around pm_runtime_resume_and_get().
* Return error code on failure or 0 on success.
*/
static inline int vfio_device_pm_runtime_get(struct vfio_device *device)
{
struct device *dev = device->dev;
if (dev->driver && dev->driver->pm) {
int ret;
ret = pm_runtime_resume_and_get(dev);
if (ret) {
dev_info_ratelimited(dev,
"vfio: runtime resume failed %d\n", ret);
return -EIO;
}
}
return 0;
}
/*
* Wrapper around pm_runtime_put().
*/
static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
{
struct device *dev = device->dev;
if (dev->driver && dev->driver->pm)
pm_runtime_put(dev);
}
/* /*
* VFIO Device fd * VFIO Device fd
*/ */
...@@ -1673,15 +1707,27 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, ...@@ -1673,15 +1707,27 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
unsigned int cmd, unsigned long arg) unsigned int cmd, unsigned long arg)
{ {
struct vfio_device *device = filep->private_data; struct vfio_device *device = filep->private_data;
int ret;
ret = vfio_device_pm_runtime_get(device);
if (ret)
return ret;
switch (cmd) { switch (cmd) {
case VFIO_DEVICE_FEATURE: case VFIO_DEVICE_FEATURE:
return vfio_ioctl_device_feature(device, (void __user *)arg); ret = vfio_ioctl_device_feature(device, (void __user *)arg);
break;
default: default:
if (unlikely(!device->ops->ioctl)) if (unlikely(!device->ops->ioctl))
return -EINVAL; ret = -EINVAL;
return device->ops->ioctl(device, cmd, arg); else
ret = device->ops->ioctl(device, cmd, arg);
break;
} }
vfio_device_pm_runtime_put(device);
return ret;
} }
static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment