• Abhishek Sahu's avatar
    vfio: Increment the runtime PM usage count during IOCTL call · 8e5c6995
    Abhishek Sahu authored
    The vfio-pci based drivers will have runtime power management
    support where the user can put the device into the low power state
    and then PCI devices can go into the D3cold state. If the device is
    in the low power state and the user issues any IOCTL, then the
    device should be moved out of the low power state first. Once
    the IOCTL is serviced, then it can go into the low power state again.
    The runtime PM framework manages this with help of usage count.
    
    One option was to add the runtime PM related API's inside vfio-pci
    driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a
    different path and more IOCTL can be added in the future. Also, the
    runtime PM will be added for vfio-pci based drivers variant currently,
    but the other VFIO based drivers can use the same in the
    future. So, this patch adds the runtime calls runtime-related API in
    the top-level IOCTL function itself.
    
    For the VFIO drivers which do not have runtime power management
    support currently, the runtime PM API's won't be invoked. Only for
    vfio-pci based drivers currently, the runtime PM API's will be invoked
    to increment and decrement the usage count. In the vfio-pci drivers also,
    the variant drivers can opt-out by incrementing the usage count during
    device-open. The pm_runtime_resume_and_get() checks the device
    current status and will return early if the device is already in the
    ACTIVE state.
    
    Taking this usage count incremented while servicing IOCTL will make
    sure that the user won't put the device into the low power state when any
    other IOCTL is being serviced in parallel. Let's consider the
    following scenario:
    
     1. Some other IOCTL is called.
     2. The user has opened another device instance and called the IOCTL for
        low power entry.
     3. The low power entry IOCTL moves the device into the low power state.
     4. The other IOCTL finishes.
    
    If we don't keep the usage count incremented then the device
    access will happen between step 3 and 4 while the device has already
    gone into the low power state.
    
    The pm_runtime_resume_and_get() will be the first call so its error
    should not be propagated to user space directly. For example, if
    pm_runtime_resume_and_get() can return -EINVAL for the cases where the
    user has passed the correct argument. So the
    pm_runtime_resume_and_get() errors have been masked behind -EIO.
    Signed-off-by: default avatarAbhishek Sahu <abhsahu@nvidia.com>
    Link: https://lore.kernel.org/r/20220829114850.4341-3-abhsahu@nvidia.comSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    8e5c6995
vfio_main.c 55.6 KB