Commit 4138f022 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Add warning in unlikely case that device is not captured with
   driver_override (Kunwu Chan)

 - Error handling improvements in mlx5-vfio-pci to detect firmware
   tracking object error states, logging of firmware error syndrom, and
   releasing of firmware resources in aborted migration sequence (Yishai
   Hadas)

 - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson)

 - Make the mdev_bus_type const and also make the class struct const for
   a couple of the vfio-mdev sample drivers (Ricardo B. Marliere)

 - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
   Grace-Hopper superchip. During initialization of the chip-to-chip
   interconnect in this hardware module, the PCI BARs of the device
   become unused in favor of a faster, coherent mechanism for exposing
   device memory. This driver primarily changes the VFIO representation
   of the device to masquerade this coherent aperture to replace the
   physical PCI BARs for userspace drivers. This also incorporates use
   of a new vma flag allowing KVM to use write combining attributes for
   uncached device memory (Ankit Agrawal)

 - Reset fixes and cleanups for the pds-vfio-pci driver. Save and
   restore files were previously leaked if the device didn't pass
   through an error state, this is resolved and later re-fixed to
   prevent access to the now freed files. Reset handling is also
   refactored to remove the complicated deferred reset mechanism (Brett
   Creeley)

 - Remove some references to pl330 in the vfio-platform amba driver
   (Geert Uytterhoeven)

 - Remove twice redundant and ugly code to unpin incidental pins of the
   zero-page (Alex Williamson)

 - Deferred reset logic is also removed from the hisi-acc-vfio-pci
   driver as a simplification (Shameer Kolothum)

 - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove
   resulting unnecessary code. There is no device firmware that has been
   available publicly without this support (Yishai Hadas)

 - Switch over to using the .remove_new callback for vfio-platform in
   support of the broader transition for a void remove function (Uwe
   Kleine-König)

 - Resolve multiple issues in interrupt code for VFIO bus drivers that
   allow calling eventfd_signal() on a NULL context. This also remove a
   potential race in INTx setup on certain hardware for vfio-pci, races
   with various mechanisms to mask INTx, and leaked virqfds in
   vfio-platform (Alex Williamson)

* tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits)
  vfio/fsl-mc: Block calling interrupt handler without trigger
  vfio/platform: Create persistent IRQ handlers
  vfio/platform: Disable virqfds on cleanup
  vfio/pci: Create persistent INTx handler
  vfio: Introduce interface to flush virqfd inject workqueue
  vfio/pci: Lock external INTx masking ops
  vfio/pci: Disable auto-enable of exclusive INTx IRQ
  vfio/pds: Refactor/simplify reset logic
  vfio/pds: Make sure migration file isn't accessed after reset
  vfio/platform: Convert to platform remove callback returning void
  vfio/mlx5: Enforce PRE_COPY support
  vfio/mbochs: make mbochs_class constant
  vfio/mdpy: make mdpy_class constant
  hisi_acc_vfio_pci: Remove the deferred_reset logic
  Revert "vfio/type1: Unpin zero pages"
  vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached
  vfio: amba: Rename pl330_ids[] to vfio_amba_ids[]
  vfio/pds: Always clear the save/restore FDs on reset
  vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
  vfio/pci: rename and export range_intersect_range
  ...
parents 4f712ee0 7447d911
...@@ -23164,12 +23164,11 @@ L: kvm@vger.kernel.org ...@@ -23164,12 +23164,11 @@ L: kvm@vger.kernel.org
S: Maintained S: Maintained
F: drivers/vfio/pci/mlx5/ F: drivers/vfio/pci/mlx5/
VFIO VIRTIO PCI DRIVER VFIO NVIDIA GRACE GPU DRIVER
M: Yishai Hadas <yishaih@nvidia.com> M: Ankit Agrawal <ankita@nvidia.com>
L: kvm@vger.kernel.org L: kvm@vger.kernel.org
L: virtualization@lists.linux.dev S: Supported
S: Maintained F: drivers/vfio/pci/nvgrace-gpu/
F: drivers/vfio/pci/virtio
VFIO PCI DEVICE SPECIFIC DRIVERS VFIO PCI DEVICE SPECIFIC DRIVERS
R: Jason Gunthorpe <jgg@nvidia.com> R: Jason Gunthorpe <jgg@nvidia.com>
...@@ -23194,6 +23193,13 @@ L: kvm@vger.kernel.org ...@@ -23194,6 +23193,13 @@ L: kvm@vger.kernel.org
S: Maintained S: Maintained
F: drivers/vfio/platform/ F: drivers/vfio/platform/
VFIO VIRTIO PCI DRIVER
M: Yishai Hadas <yishaih@nvidia.com>
L: kvm@vger.kernel.org
L: virtualization@lists.linux.dev
S: Maintained
F: drivers/vfio/pci/virtio
VGA_SWITCHEROO VGA_SWITCHEROO
R: Lukas Wunner <lukas@wunner.de> R: Lukas Wunner <lukas@wunner.de>
S: Maintained S: Maintained
......
...@@ -141,13 +141,14 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev, ...@@ -141,13 +141,14 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev,
irq = &vdev->mc_irqs[index]; irq = &vdev->mc_irqs[index];
if (flags & VFIO_IRQ_SET_DATA_NONE) { if (flags & VFIO_IRQ_SET_DATA_NONE) {
vfio_fsl_mc_irq_handler(hwirq, irq); if (irq->trigger)
eventfd_signal(irq->trigger);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) { } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
u8 trigger = *(u8 *)data; u8 trigger = *(u8 *)data;
if (trigger) if (trigger && irq->trigger)
vfio_fsl_mc_irq_handler(hwirq, irq); eventfd_signal(irq->trigger);
} }
return 0; return 0;
......
...@@ -40,7 +40,7 @@ static int mdev_match(struct device *dev, struct device_driver *drv) ...@@ -40,7 +40,7 @@ static int mdev_match(struct device *dev, struct device_driver *drv)
return 0; return 0;
} }
struct bus_type mdev_bus_type = { const struct bus_type mdev_bus_type = {
.name = "mdev", .name = "mdev",
.probe = mdev_probe, .probe = mdev_probe,
.remove = mdev_remove, .remove = mdev_remove,
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
int mdev_bus_register(void); int mdev_bus_register(void);
void mdev_bus_unregister(void); void mdev_bus_unregister(void);
extern struct bus_type mdev_bus_type; extern const struct bus_type mdev_bus_type;
extern const struct attribute_group *mdev_device_groups[]; extern const struct attribute_group *mdev_device_groups[];
#define to_mdev_type_attr(_attr) \ #define to_mdev_type_attr(_attr) \
......
...@@ -67,4 +67,6 @@ source "drivers/vfio/pci/pds/Kconfig" ...@@ -67,4 +67,6 @@ source "drivers/vfio/pci/pds/Kconfig"
source "drivers/vfio/pci/virtio/Kconfig" source "drivers/vfio/pci/virtio/Kconfig"
source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
endmenu endmenu
...@@ -15,3 +15,5 @@ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ ...@@ -15,3 +15,5 @@ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
obj-$(CONFIG_PDS_VFIO_PCI) += pds/ obj-$(CONFIG_PDS_VFIO_PCI) += pds/
obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/ obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu/
...@@ -630,25 +630,11 @@ static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vde ...@@ -630,25 +630,11 @@ static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vde
} }
} }
/* static void hisi_acc_vf_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev)
* This function is called in all state_mutex unlock cases to
* handle a 'deferred_reset' if exists.
*/
static void
hisi_acc_vf_state_mutex_unlock(struct hisi_acc_vf_core_device *hisi_acc_vdev)
{ {
again: hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
spin_lock(&hisi_acc_vdev->reset_lock); hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
if (hisi_acc_vdev->deferred_reset) { hisi_acc_vf_disable_fds(hisi_acc_vdev);
hisi_acc_vdev->deferred_reset = false;
spin_unlock(&hisi_acc_vdev->reset_lock);
hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
hisi_acc_vf_disable_fds(hisi_acc_vdev);
goto again;
}
mutex_unlock(&hisi_acc_vdev->state_mutex);
spin_unlock(&hisi_acc_vdev->reset_lock);
} }
static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vdev) static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vdev)
...@@ -804,8 +790,10 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp, ...@@ -804,8 +790,10 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp,
info.dirty_bytes = 0; info.dirty_bytes = 0;
info.initial_bytes = migf->total_length - *pos; info.initial_bytes = migf->total_length - *pos;
mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex);
ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
out: out:
mutex_unlock(&migf->lock); mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex); mutex_unlock(&hisi_acc_vdev->state_mutex);
...@@ -1071,7 +1059,7 @@ hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev, ...@@ -1071,7 +1059,7 @@ hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
break; break;
} }
} }
hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev); mutex_unlock(&hisi_acc_vdev->state_mutex);
return res; return res;
} }
...@@ -1092,7 +1080,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev, ...@@ -1092,7 +1080,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
mutex_lock(&hisi_acc_vdev->state_mutex); mutex_lock(&hisi_acc_vdev->state_mutex);
*curr_state = hisi_acc_vdev->mig_state; *curr_state = hisi_acc_vdev->mig_state;
hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev); mutex_unlock(&hisi_acc_vdev->state_mutex);
return 0; return 0;
} }
...@@ -1104,21 +1092,9 @@ static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev) ...@@ -1104,21 +1092,9 @@ static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
VFIO_MIGRATION_STOP_COPY) VFIO_MIGRATION_STOP_COPY)
return; return;
/* mutex_lock(&hisi_acc_vdev->state_mutex);
* As the higher VFIO layers are holding locks across reset and using hisi_acc_vf_reset(hisi_acc_vdev);
* those same locks with the mm_lock we need to prevent ABBA deadlock mutex_unlock(&hisi_acc_vdev->state_mutex);
* with the state_mutex and mm_lock.
* In case the state_mutex was taken already we defer the cleanup work
* to the unlock flow of the other running context.
*/
spin_lock(&hisi_acc_vdev->reset_lock);
hisi_acc_vdev->deferred_reset = true;
if (!mutex_trylock(&hisi_acc_vdev->state_mutex)) {
spin_unlock(&hisi_acc_vdev->reset_lock);
return;
}
spin_unlock(&hisi_acc_vdev->reset_lock);
hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
} }
static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev) static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
......
...@@ -98,8 +98,8 @@ struct hisi_acc_vf_migration_file { ...@@ -98,8 +98,8 @@ struct hisi_acc_vf_migration_file {
struct hisi_acc_vf_core_device { struct hisi_acc_vf_core_device {
struct vfio_pci_core_device core_device; struct vfio_pci_core_device core_device;
u8 match_done:1; u8 match_done;
u8 deferred_reset:1;
/* For migration state */ /* For migration state */
struct mutex state_mutex; struct mutex state_mutex;
enum vfio_device_mig_state mig_state; enum vfio_device_mig_state mig_state;
...@@ -109,8 +109,6 @@ struct hisi_acc_vf_core_device { ...@@ -109,8 +109,6 @@ struct hisi_acc_vf_core_device {
struct hisi_qm vf_qm; struct hisi_qm vf_qm;
u32 vf_qm_state; u32 vf_qm_state;
int vf_id; int vf_id;
/* For reset handler */
spinlock_t reset_lock;
struct hisi_acc_vf_migration_file *resuming_migf; struct hisi_acc_vf_migration_file *resuming_migf;
struct hisi_acc_vf_migration_file *saving_migf; struct hisi_acc_vf_migration_file *saving_migf;
}; };
......
...@@ -108,8 +108,9 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -108,8 +108,9 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp); ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp);
if (ret) if (ret)
return ret; return ret;
if (mvdev->saving_migf->state == /* Upon cleanup, ignore previous pre_copy error state */
MLX5_MIGF_STATE_PRE_COPY_ERROR) { if (mvdev->saving_migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR &&
!(query_flags & MLX5VF_QUERY_CLEANUP)) {
/* /*
* In case we had a PRE_COPY error, only query full * In case we had a PRE_COPY error, only query full
* image for final image * image for final image
...@@ -121,6 +122,11 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -121,6 +122,11 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
} }
query_flags &= ~MLX5VF_QUERY_INC; query_flags &= ~MLX5VF_QUERY_INC;
} }
/* Block incremental query which is state-dependent */
if (mvdev->saving_migf->state == MLX5_MIGF_STATE_ERROR) {
complete(&mvdev->saving_migf->save_comp);
return -ENODEV;
}
} }
MLX5_SET(query_vhca_migration_state_in, in, opcode, MLX5_SET(query_vhca_migration_state_in, in, opcode,
...@@ -149,6 +155,12 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -149,6 +155,12 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
return 0; return 0;
} }
static void set_tracker_change_event(struct mlx5vf_pci_core_device *mvdev)
{
mvdev->tracker.object_changed = true;
complete(&mvdev->tracker_comp);
}
static void set_tracker_error(struct mlx5vf_pci_core_device *mvdev) static void set_tracker_error(struct mlx5vf_pci_core_device *mvdev)
{ {
/* Mark the tracker under an error and wake it up if it's running */ /* Mark the tracker under an error and wake it up if it's running */
...@@ -189,7 +201,7 @@ void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev) ...@@ -189,7 +201,7 @@ void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev)
/* Must be done outside the lock to let it progress */ /* Must be done outside the lock to let it progress */
set_tracker_error(mvdev); set_tracker_error(mvdev);
mutex_lock(&mvdev->state_mutex); mutex_lock(&mvdev->state_mutex);
mlx5vf_disable_fds(mvdev); mlx5vf_disable_fds(mvdev, NULL);
_mlx5vf_free_page_tracker_resources(mvdev); _mlx5vf_free_page_tracker_resources(mvdev);
mlx5vf_state_mutex_unlock(mvdev); mlx5vf_state_mutex_unlock(mvdev);
} }
...@@ -221,6 +233,10 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, ...@@ -221,6 +233,10 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
if (!MLX5_CAP_GEN(mvdev->mdev, migration)) if (!MLX5_CAP_GEN(mvdev->mdev, migration))
goto end; goto end;
if (!(MLX5_CAP_GEN_2(mvdev->mdev, migration_multi_load) &&
MLX5_CAP_GEN_2(mvdev->mdev, migration_tracking_state)))
goto end;
mvdev->vf_id = pci_iov_vf_id(pdev); mvdev->vf_id = pci_iov_vf_id(pdev);
if (mvdev->vf_id < 0) if (mvdev->vf_id < 0)
goto end; goto end;
...@@ -250,17 +266,14 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, ...@@ -250,17 +266,14 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
mvdev->migrate_cap = 1; mvdev->migrate_cap = 1;
mvdev->core_device.vdev.migration_flags = mvdev->core_device.vdev.migration_flags =
VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_STOP_COPY |
VFIO_MIGRATION_P2P; VFIO_MIGRATION_P2P |
VFIO_MIGRATION_PRE_COPY;
mvdev->core_device.vdev.mig_ops = mig_ops; mvdev->core_device.vdev.mig_ops = mig_ops;
init_completion(&mvdev->tracker_comp); init_completion(&mvdev->tracker_comp);
if (MLX5_CAP_GEN(mvdev->mdev, adv_virtualization)) if (MLX5_CAP_GEN(mvdev->mdev, adv_virtualization))
mvdev->core_device.vdev.log_ops = log_ops; mvdev->core_device.vdev.log_ops = log_ops;
if (MLX5_CAP_GEN_2(mvdev->mdev, migration_multi_load) &&
MLX5_CAP_GEN_2(mvdev->mdev, migration_tracking_state))
mvdev->core_device.vdev.migration_flags |=
VFIO_MIGRATION_PRE_COPY;
if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks)) if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks))
mvdev->chunk_mode = 1; mvdev->chunk_mode = 1;
...@@ -402,6 +415,50 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) ...@@ -402,6 +415,50 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf)
kfree(buf); kfree(buf);
} }
static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf,
unsigned int npages)
{
unsigned int to_alloc = npages;
struct page **page_list;
unsigned long filled;
unsigned int to_fill;
int ret;
to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list));
page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT);
if (!page_list)
return -ENOMEM;
do {
filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill,
page_list);
if (!filled) {
ret = -ENOMEM;
goto err;
}
to_alloc -= filled;
ret = sg_alloc_append_table_from_pages(
&buf->table, page_list, filled, 0,
filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC,
GFP_KERNEL_ACCOUNT);
if (ret)
goto err;
buf->allocated_length += filled * PAGE_SIZE;
/* clean input for another bulk allocation */
memset(page_list, 0, filled * sizeof(*page_list));
to_fill = min_t(unsigned int, to_alloc,
PAGE_SIZE / sizeof(*page_list));
} while (to_alloc > 0);
kvfree(page_list);
return 0;
err:
kvfree(page_list);
return ret;
}
struct mlx5_vhca_data_buffer * struct mlx5_vhca_data_buffer *
mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf,
size_t length, size_t length,
...@@ -608,8 +665,13 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) ...@@ -608,8 +665,13 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
err: err:
/* The error flow can't run from an interrupt context */ /* The error flow can't run from an interrupt context */
if (status == -EREMOTEIO) if (status == -EREMOTEIO) {
status = MLX5_GET(save_vhca_state_out, async_data->out, status); status = MLX5_GET(save_vhca_state_out, async_data->out, status);
/* Failed in FW, print cmd out failure details */
mlx5_cmd_out_err(migf->mvdev->mdev, MLX5_CMD_OP_SAVE_VHCA_STATE, 0,
async_data->out);
}
async_data->status = status; async_data->status = status;
queue_work(migf->mvdev->cb_wq, &async_data->work); queue_work(migf->mvdev->cb_wq, &async_data->work);
} }
...@@ -623,6 +685,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -623,6 +685,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
struct mlx5_vhca_data_buffer *header_buf = NULL; struct mlx5_vhca_data_buffer *header_buf = NULL;
struct mlx5vf_async_data *async_data; struct mlx5vf_async_data *async_data;
bool pre_copy_cleanup = false;
int err; int err;
lockdep_assert_held(&mvdev->state_mutex); lockdep_assert_held(&mvdev->state_mutex);
...@@ -633,6 +696,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -633,6 +696,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
if (err) if (err)
return err; return err;
if ((migf->state == MLX5_MIGF_STATE_PRE_COPY ||
migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR) && !track && !inc)
pre_copy_cleanup = true;
if (migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR) if (migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR)
/* /*
* In case we had a PRE_COPY error, SAVE is triggered only for * In case we had a PRE_COPY error, SAVE is triggered only for
...@@ -651,29 +718,27 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, ...@@ -651,29 +718,27 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
async_data = &migf->async_data; async_data = &migf->async_data;
async_data->buf = buf; async_data->buf = buf;
async_data->stop_copy_chunk = !track; async_data->stop_copy_chunk = (!track && !pre_copy_cleanup);
async_data->out = kvzalloc(out_size, GFP_KERNEL); async_data->out = kvzalloc(out_size, GFP_KERNEL);
if (!async_data->out) { if (!async_data->out) {
err = -ENOMEM; err = -ENOMEM;
goto err_out; goto err_out;
} }
if (MLX5VF_PRE_COPY_SUPP(mvdev)) { if (async_data->stop_copy_chunk) {
if (async_data->stop_copy_chunk) { u8 header_idx = buf->stop_copy_chunk_num ?
u8 header_idx = buf->stop_copy_chunk_num ? buf->stop_copy_chunk_num - 1 : 0;
buf->stop_copy_chunk_num - 1 : 0;
header_buf = migf->buf_header[header_idx]; header_buf = migf->buf_header[header_idx];
migf->buf_header[header_idx] = NULL; migf->buf_header[header_idx] = NULL;
} }
if (!header_buf) { if (!header_buf) {
header_buf = mlx5vf_get_data_buffer(migf, header_buf = mlx5vf_get_data_buffer(migf,
sizeof(struct mlx5_vf_migration_header), DMA_NONE); sizeof(struct mlx5_vf_migration_header), DMA_NONE);
if (IS_ERR(header_buf)) { if (IS_ERR(header_buf)) {
err = PTR_ERR(header_buf); err = PTR_ERR(header_buf);
goto err_free; goto err_free;
}
} }
} }
...@@ -900,6 +965,29 @@ static int mlx5vf_cmd_modify_tracker(struct mlx5_core_dev *mdev, ...@@ -900,6 +965,29 @@ static int mlx5vf_cmd_modify_tracker(struct mlx5_core_dev *mdev,
return mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out)); return mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
} }
static int mlx5vf_cmd_query_tracker(struct mlx5_core_dev *mdev,
struct mlx5_vhca_page_tracker *tracker)
{
u32 out[MLX5_ST_SZ_DW(query_page_track_obj_out)] = {};
u32 in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {};
void *obj_context;
void *cmd_hdr;
int err;
cmd_hdr = MLX5_ADDR_OF(modify_page_track_obj_in, in, general_obj_in_cmd_hdr);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_PAGE_TRACK);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, tracker->id);
err = mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
if (err)
return err;
obj_context = MLX5_ADDR_OF(query_page_track_obj_out, out, obj_context);
tracker->status = MLX5_GET(page_track, obj_context, state);
return 0;
}
static int alloc_cq_frag_buf(struct mlx5_core_dev *mdev, static int alloc_cq_frag_buf(struct mlx5_core_dev *mdev,
struct mlx5_vhca_cq_buf *buf, int nent, struct mlx5_vhca_cq_buf *buf, int nent,
int cqe_size) int cqe_size)
...@@ -957,9 +1045,11 @@ static int mlx5vf_event_notifier(struct notifier_block *nb, unsigned long type, ...@@ -957,9 +1045,11 @@ static int mlx5vf_event_notifier(struct notifier_block *nb, unsigned long type,
mlx5_nb_cof(nb, struct mlx5_vhca_page_tracker, nb); mlx5_nb_cof(nb, struct mlx5_vhca_page_tracker, nb);
struct mlx5vf_pci_core_device *mvdev = container_of( struct mlx5vf_pci_core_device *mvdev = container_of(
tracker, struct mlx5vf_pci_core_device, tracker); tracker, struct mlx5vf_pci_core_device, tracker);
struct mlx5_eqe_obj_change *object;
struct mlx5_eqe *eqe = data; struct mlx5_eqe *eqe = data;
u8 event_type = (u8)type; u8 event_type = (u8)type;
u8 queue_type; u8 queue_type;
u32 obj_id;
int qp_num; int qp_num;
switch (event_type) { switch (event_type) {
...@@ -975,6 +1065,12 @@ static int mlx5vf_event_notifier(struct notifier_block *nb, unsigned long type, ...@@ -975,6 +1065,12 @@ static int mlx5vf_event_notifier(struct notifier_block *nb, unsigned long type,
break; break;
set_tracker_error(mvdev); set_tracker_error(mvdev);
break; break;
case MLX5_EVENT_TYPE_OBJECT_CHANGE:
object = &eqe->data.obj_change;
obj_id = be32_to_cpu(object->obj_id);
if (obj_id == tracker->id)
set_tracker_change_event(mvdev);
break;
default: default:
break; break;
} }
...@@ -1634,6 +1730,11 @@ int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova, ...@@ -1634,6 +1730,11 @@ int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova,
goto end; goto end;
} }
if (tracker->is_err) {
err = -EIO;
goto end;
}
mdev = mvdev->mdev; mdev = mvdev->mdev;
err = mlx5vf_cmd_modify_tracker(mdev, tracker->id, iova, length, err = mlx5vf_cmd_modify_tracker(mdev, tracker->id, iova, length,
MLX5_PAGE_TRACK_STATE_REPORTING); MLX5_PAGE_TRACK_STATE_REPORTING);
...@@ -1652,6 +1753,12 @@ int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova, ...@@ -1652,6 +1753,12 @@ int mlx5vf_tracker_read_and_clear(struct vfio_device *vdev, unsigned long iova,
dirty, &tracker->status); dirty, &tracker->status);
if (poll_err == CQ_EMPTY) { if (poll_err == CQ_EMPTY) {
wait_for_completion(&mvdev->tracker_comp); wait_for_completion(&mvdev->tracker_comp);
if (tracker->object_changed) {
tracker->object_changed = false;
err = mlx5vf_cmd_query_tracker(mdev, tracker);
if (err)
goto end;
}
continue; continue;
} }
} }
......
...@@ -13,9 +13,6 @@ ...@@ -13,9 +13,6 @@
#include <linux/mlx5/cq.h> #include <linux/mlx5/cq.h>
#include <linux/mlx5/qp.h> #include <linux/mlx5/qp.h>
#define MLX5VF_PRE_COPY_SUPP(mvdev) \
((mvdev)->core_device.vdev.migration_flags & VFIO_MIGRATION_PRE_COPY)
enum mlx5_vf_migf_state { enum mlx5_vf_migf_state {
MLX5_MIGF_STATE_ERROR = 1, MLX5_MIGF_STATE_ERROR = 1,
MLX5_MIGF_STATE_PRE_COPY_ERROR, MLX5_MIGF_STATE_PRE_COPY_ERROR,
...@@ -25,7 +22,6 @@ enum mlx5_vf_migf_state { ...@@ -25,7 +22,6 @@ enum mlx5_vf_migf_state {
}; };
enum mlx5_vf_load_state { enum mlx5_vf_load_state {
MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER,
MLX5_VF_LOAD_STATE_READ_HEADER, MLX5_VF_LOAD_STATE_READ_HEADER,
MLX5_VF_LOAD_STATE_PREP_HEADER_DATA, MLX5_VF_LOAD_STATE_PREP_HEADER_DATA,
MLX5_VF_LOAD_STATE_READ_HEADER_DATA, MLX5_VF_LOAD_STATE_READ_HEADER_DATA,
...@@ -162,6 +158,7 @@ struct mlx5_vhca_page_tracker { ...@@ -162,6 +158,7 @@ struct mlx5_vhca_page_tracker {
u32 id; u32 id;
u32 pdn; u32 pdn;
u8 is_err:1; u8 is_err:1;
u8 object_changed:1;
struct mlx5_uars_page *uar; struct mlx5_uars_page *uar;
struct mlx5_vhca_cq cq; struct mlx5_vhca_cq cq;
struct mlx5_vhca_qp *host_qp; struct mlx5_vhca_qp *host_qp;
...@@ -196,6 +193,7 @@ struct mlx5vf_pci_core_device { ...@@ -196,6 +193,7 @@ struct mlx5vf_pci_core_device {
enum { enum {
MLX5VF_QUERY_INC = (1UL << 0), MLX5VF_QUERY_INC = (1UL << 0),
MLX5VF_QUERY_FINAL = (1UL << 1), MLX5VF_QUERY_FINAL = (1UL << 1),
MLX5VF_QUERY_CLEANUP = (1UL << 2),
}; };
int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
...@@ -226,12 +224,11 @@ struct mlx5_vhca_data_buffer * ...@@ -226,12 +224,11 @@ struct mlx5_vhca_data_buffer *
mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf,
size_t length, enum dma_data_direction dma_dir); size_t length, enum dma_data_direction dma_dir);
void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf);
int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf,
unsigned int npages);
struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf,
unsigned long offset); unsigned long offset);
void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev,
enum mlx5_vf_migf_state *last_save_state);
void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work);
void mlx5vf_mig_file_set_save_work(struct mlx5_vf_migration_file *migf, void mlx5vf_mig_file_set_save_work(struct mlx5_vf_migration_file *migf,
u8 chunk_num, size_t next_required_umem_size); u8 chunk_num, size_t next_required_umem_size);
......
...@@ -65,50 +65,6 @@ mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, ...@@ -65,50 +65,6 @@ mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf,
return NULL; return NULL;
} }
int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf,
unsigned int npages)
{
unsigned int to_alloc = npages;
struct page **page_list;
unsigned long filled;
unsigned int to_fill;
int ret;
to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list));
page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT);
if (!page_list)
return -ENOMEM;
do {
filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill,
page_list);
if (!filled) {
ret = -ENOMEM;
goto err;
}
to_alloc -= filled;
ret = sg_alloc_append_table_from_pages(
&buf->table, page_list, filled, 0,
filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC,
GFP_KERNEL_ACCOUNT);
if (ret)
goto err;
buf->allocated_length += filled * PAGE_SIZE;
/* clean input for another bulk allocation */
memset(page_list, 0, filled * sizeof(*page_list));
to_fill = min_t(unsigned int, to_alloc,
PAGE_SIZE / sizeof(*page_list));
} while (to_alloc > 0);
kvfree(page_list);
return 0;
err:
kvfree(page_list);
return ret;
}
static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf)
{ {
mutex_lock(&migf->lock); mutex_lock(&migf->lock);
...@@ -777,36 +733,6 @@ mlx5vf_append_page_to_mig_buf(struct mlx5_vhca_data_buffer *vhca_buf, ...@@ -777,36 +733,6 @@ mlx5vf_append_page_to_mig_buf(struct mlx5_vhca_data_buffer *vhca_buf,
return 0; return 0;
} }
static int
mlx5vf_resume_read_image_no_header(struct mlx5_vhca_data_buffer *vhca_buf,
loff_t requested_length,
const char __user **buf, size_t *len,
loff_t *pos, ssize_t *done)
{
int ret;
if (requested_length > MAX_LOAD_SIZE)
return -ENOMEM;
if (vhca_buf->allocated_length < requested_length) {
ret = mlx5vf_add_migration_pages(
vhca_buf,
DIV_ROUND_UP(requested_length - vhca_buf->allocated_length,
PAGE_SIZE));
if (ret)
return ret;
}
while (*len) {
ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, len, pos,
done);
if (ret)
return ret;
}
return 0;
}
static ssize_t static ssize_t
mlx5vf_resume_read_image(struct mlx5_vf_migration_file *migf, mlx5vf_resume_read_image(struct mlx5_vf_migration_file *migf,
struct mlx5_vhca_data_buffer *vhca_buf, struct mlx5_vhca_data_buffer *vhca_buf,
...@@ -1038,13 +964,6 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, ...@@ -1038,13 +964,6 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE; migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE;
break; break;
} }
case MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER:
ret = mlx5vf_resume_read_image_no_header(vhca_buf,
requested_length,
&buf, &len, pos, &done);
if (ret)
goto out_unlock;
break;
case MLX5_VF_LOAD_STATE_READ_IMAGE: case MLX5_VF_LOAD_STATE_READ_IMAGE:
ret = mlx5vf_resume_read_image(migf, vhca_buf, ret = mlx5vf_resume_read_image(migf, vhca_buf,
migf->record_size, migf->record_size,
...@@ -1114,21 +1033,16 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) ...@@ -1114,21 +1033,16 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
} }
migf->buf[0] = buf; migf->buf[0] = buf;
if (MLX5VF_PRE_COPY_SUPP(mvdev)) { buf = mlx5vf_alloc_data_buffer(migf,
buf = mlx5vf_alloc_data_buffer(migf, sizeof(struct mlx5_vf_migration_header), DMA_NONE);
sizeof(struct mlx5_vf_migration_header), DMA_NONE); if (IS_ERR(buf)) {
if (IS_ERR(buf)) { ret = PTR_ERR(buf);
ret = PTR_ERR(buf); goto out_buf;
goto out_buf;
}
migf->buf_header[0] = buf;
migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER;
} else {
/* Initial state will be to read the image */
migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER;
} }
migf->buf_header[0] = buf;
migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER;
stream_open(migf->filp->f_inode, migf->filp); stream_open(migf->filp->f_inode, migf->filp);
mutex_init(&migf->lock); mutex_init(&migf->lock);
INIT_LIST_HEAD(&migf->buf_list); INIT_LIST_HEAD(&migf->buf_list);
...@@ -1146,7 +1060,8 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) ...@@ -1146,7 +1060,8 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
return ERR_PTR(ret); return ERR_PTR(ret);
} }
void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev,
enum mlx5_vf_migf_state *last_save_state)
{ {
if (mvdev->resuming_migf) { if (mvdev->resuming_migf) {
mlx5vf_disable_fd(mvdev->resuming_migf); mlx5vf_disable_fd(mvdev->resuming_migf);
...@@ -1157,6 +1072,8 @@ void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) ...@@ -1157,6 +1072,8 @@ void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
if (mvdev->saving_migf) { if (mvdev->saving_migf) {
mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx); mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx);
cancel_work_sync(&mvdev->saving_migf->async_data.work); cancel_work_sync(&mvdev->saving_migf->async_data.work);
if (last_save_state)
*last_save_state = mvdev->saving_migf->state;
mlx5vf_disable_fd(mvdev->saving_migf); mlx5vf_disable_fd(mvdev->saving_migf);
wake_up_interruptible(&mvdev->saving_migf->poll_wait); wake_up_interruptible(&mvdev->saving_migf->poll_wait);
mlx5fv_cmd_clean_migf_resources(mvdev->saving_migf); mlx5fv_cmd_clean_migf_resources(mvdev->saving_migf);
...@@ -1217,12 +1134,34 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, ...@@ -1217,12 +1134,34 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
return migf->filp; return migf->filp;
} }
if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) || if (cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) {
(cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) || mlx5vf_disable_fds(mvdev, NULL);
return NULL;
}
if ((cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) ||
(cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P &&
new == VFIO_DEVICE_STATE_RUNNING_P2P)) { new == VFIO_DEVICE_STATE_RUNNING_P2P)) {
mlx5vf_disable_fds(mvdev); struct mlx5_vf_migration_file *migf = mvdev->saving_migf;
return NULL; struct mlx5_vhca_data_buffer *buf;
enum mlx5_vf_migf_state state;
size_t size;
ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL,
MLX5VF_QUERY_INC | MLX5VF_QUERY_CLEANUP);
if (ret)
return ERR_PTR(ret);
buf = mlx5vf_get_data_buffer(migf, size, DMA_FROM_DEVICE);
if (IS_ERR(buf))
return ERR_CAST(buf);
/* pre_copy cleanup */
ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, false, false);
if (ret) {
mlx5vf_put_data_buffer(buf);
return ERR_PTR(ret);
}
mlx5vf_disable_fds(mvdev, &state);
return (state != MLX5_MIGF_STATE_ERROR) ? NULL : ERR_PTR(-EIO);
} }
if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) { if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
...@@ -1237,14 +1176,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, ...@@ -1237,14 +1176,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
} }
if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
if (!MLX5VF_PRE_COPY_SUPP(mvdev)) { mlx5vf_disable_fds(mvdev, NULL);
ret = mlx5vf_cmd_load_vhca_state(mvdev,
mvdev->resuming_migf,
mvdev->resuming_migf->buf[0]);
if (ret)
return ERR_PTR(ret);
}
mlx5vf_disable_fds(mvdev);
return NULL; return NULL;
} }
...@@ -1289,7 +1221,7 @@ void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev) ...@@ -1289,7 +1221,7 @@ void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
mvdev->deferred_reset = false; mvdev->deferred_reset = false;
spin_unlock(&mvdev->reset_lock); spin_unlock(&mvdev->reset_lock);
mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING; mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
mlx5vf_disable_fds(mvdev); mlx5vf_disable_fds(mvdev, NULL);
goto again; goto again;
} }
mutex_unlock(&mvdev->state_mutex); mutex_unlock(&mvdev->state_mutex);
......
# SPDX-License-Identifier: GPL-2.0-only
config NVGRACE_GPU_VFIO_PCI
tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip"
depends on ARM64 || (COMPILE_TEST && 64BIT)
select VFIO_PCI_CORE
help
VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
required to assign the GPU device to userspace using KVM/qemu/etc.
If you don't know what to do here, say N.
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
nvgrace-gpu-vfio-pci-y := main.o
This diff is collapsed.
...@@ -607,7 +607,7 @@ int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova, ...@@ -607,7 +607,7 @@ int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,
mutex_lock(&pds_vfio->state_mutex); mutex_lock(&pds_vfio->state_mutex);
err = pds_vfio_dirty_sync(pds_vfio, dirty, iova, length); err = pds_vfio_dirty_sync(pds_vfio, dirty, iova, length);
pds_vfio_state_mutex_unlock(pds_vfio); mutex_unlock(&pds_vfio->state_mutex);
return err; return err;
} }
...@@ -624,7 +624,7 @@ int pds_vfio_dma_logging_start(struct vfio_device *vdev, ...@@ -624,7 +624,7 @@ int pds_vfio_dma_logging_start(struct vfio_device *vdev,
mutex_lock(&pds_vfio->state_mutex); mutex_lock(&pds_vfio->state_mutex);
pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_IN_PROGRESS); pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_IN_PROGRESS);
err = pds_vfio_dirty_enable(pds_vfio, ranges, nnodes, page_size); err = pds_vfio_dirty_enable(pds_vfio, ranges, nnodes, page_size);
pds_vfio_state_mutex_unlock(pds_vfio); mutex_unlock(&pds_vfio->state_mutex);
return err; return err;
} }
...@@ -637,7 +637,7 @@ int pds_vfio_dma_logging_stop(struct vfio_device *vdev) ...@@ -637,7 +637,7 @@ int pds_vfio_dma_logging_stop(struct vfio_device *vdev)
mutex_lock(&pds_vfio->state_mutex); mutex_lock(&pds_vfio->state_mutex);
pds_vfio_dirty_disable(pds_vfio, true); pds_vfio_dirty_disable(pds_vfio, true);
pds_vfio_state_mutex_unlock(pds_vfio); mutex_unlock(&pds_vfio->state_mutex);
return 0; return 0;
} }
...@@ -92,8 +92,10 @@ static void pds_vfio_put_lm_file(struct pds_vfio_lm_file *lm_file) ...@@ -92,8 +92,10 @@ static void pds_vfio_put_lm_file(struct pds_vfio_lm_file *lm_file)
{ {
mutex_lock(&lm_file->lock); mutex_lock(&lm_file->lock);
lm_file->disabled = true;
lm_file->size = 0; lm_file->size = 0;
lm_file->alloc_size = 0; lm_file->alloc_size = 0;
lm_file->filep->f_pos = 0;
/* Free scatter list of file pages */ /* Free scatter list of file pages */
sg_free_table(&lm_file->sg_table); sg_free_table(&lm_file->sg_table);
...@@ -183,6 +185,12 @@ static ssize_t pds_vfio_save_read(struct file *filp, char __user *buf, ...@@ -183,6 +185,12 @@ static ssize_t pds_vfio_save_read(struct file *filp, char __user *buf,
pos = &filp->f_pos; pos = &filp->f_pos;
mutex_lock(&lm_file->lock); mutex_lock(&lm_file->lock);
if (lm_file->disabled) {
done = -ENODEV;
goto out_unlock;
}
if (*pos > lm_file->size) { if (*pos > lm_file->size) {
done = -EINVAL; done = -EINVAL;
goto out_unlock; goto out_unlock;
...@@ -283,6 +291,11 @@ static ssize_t pds_vfio_restore_write(struct file *filp, const char __user *buf, ...@@ -283,6 +291,11 @@ static ssize_t pds_vfio_restore_write(struct file *filp, const char __user *buf,
mutex_lock(&lm_file->lock); mutex_lock(&lm_file->lock);
if (lm_file->disabled) {
done = -ENODEV;
goto out_unlock;
}
while (len) { while (len) {
size_t page_offset; size_t page_offset;
struct page *page; struct page *page;
......
...@@ -27,6 +27,7 @@ struct pds_vfio_lm_file { ...@@ -27,6 +27,7 @@ struct pds_vfio_lm_file {
struct scatterlist *last_offset_sg; /* Iterator */ struct scatterlist *last_offset_sg; /* Iterator */
unsigned int sg_last_entry; unsigned int sg_last_entry;
unsigned long last_offset; unsigned long last_offset;
bool disabled;
}; };
struct pds_vfio_pci_device; struct pds_vfio_pci_device;
......
...@@ -21,16 +21,13 @@ ...@@ -21,16 +21,13 @@
static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio) static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio)
{ {
bool deferred_reset_needed = false;
/* /*
* Documentation states that the kernel migration driver must not * Documentation states that the kernel migration driver must not
* generate asynchronous device state transitions outside of * generate asynchronous device state transitions outside of
* manipulation by the user or the VFIO_DEVICE_RESET ioctl. * manipulation by the user or the VFIO_DEVICE_RESET ioctl.
* *
* Since recovery is an asynchronous event received from the device, * Since recovery is an asynchronous event received from the device,
* initiate a deferred reset. Issue a deferred reset in the following * initiate a reset in the following situations:
* situations:
* 1. Migration is in progress, which will cause the next step of * 1. Migration is in progress, which will cause the next step of
* the migration to fail. * the migration to fail.
* 2. If the device is in a state that will be set to * 2. If the device is in a state that will be set to
...@@ -42,24 +39,8 @@ static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio) ...@@ -42,24 +39,8 @@ static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio)
pds_vfio->state != VFIO_DEVICE_STATE_ERROR) || pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
(pds_vfio->state == VFIO_DEVICE_STATE_RUNNING && (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
pds_vfio_dirty_is_enabled(pds_vfio))) pds_vfio_dirty_is_enabled(pds_vfio)))
deferred_reset_needed = true; pds_vfio_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
mutex_unlock(&pds_vfio->state_mutex); mutex_unlock(&pds_vfio->state_mutex);
/*
* On the next user initiated state transition, the device will
* transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's
* responsibility to reset the device.
*
* If a VFIO_DEVICE_RESET is requested post recovery and before the next
* state transition, then the deferred reset state will be set to
* VFIO_DEVICE_STATE_RUNNING.
*/
if (deferred_reset_needed) {
mutex_lock(&pds_vfio->reset_mutex);
pds_vfio->deferred_reset = true;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_ERROR;
mutex_unlock(&pds_vfio->reset_mutex);
}
} }
static int pds_vfio_pci_notify_handler(struct notifier_block *nb, static int pds_vfio_pci_notify_handler(struct notifier_block *nb,
...@@ -185,7 +166,9 @@ static void pds_vfio_pci_aer_reset_done(struct pci_dev *pdev) ...@@ -185,7 +166,9 @@ static void pds_vfio_pci_aer_reset_done(struct pci_dev *pdev)
{ {
struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev); struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);
pds_vfio_reset(pds_vfio); mutex_lock(&pds_vfio->state_mutex);
pds_vfio_reset(pds_vfio, VFIO_DEVICE_STATE_RUNNING);
mutex_unlock(&pds_vfio->state_mutex);
} }
static const struct pci_error_handlers pds_vfio_pci_err_handlers = { static const struct pci_error_handlers pds_vfio_pci_err_handlers = {
......
...@@ -26,37 +26,14 @@ struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) ...@@ -26,37 +26,14 @@ struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev)
vfio_coredev); vfio_coredev);
} }
void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio,
enum vfio_device_mig_state state)
{ {
again: pds_vfio_put_restore_file(pds_vfio);
mutex_lock(&pds_vfio->reset_mutex); pds_vfio_put_save_file(pds_vfio);
if (pds_vfio->deferred_reset) { if (state == VFIO_DEVICE_STATE_ERROR)
pds_vfio->deferred_reset = false; pds_vfio_dirty_disable(pds_vfio, false);
if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) { pds_vfio->state = state;
pds_vfio_put_restore_file(pds_vfio);
pds_vfio_put_save_file(pds_vfio);
pds_vfio_dirty_disable(pds_vfio, false);
}
pds_vfio->state = pds_vfio->deferred_reset_state;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
mutex_unlock(&pds_vfio->reset_mutex);
goto again;
}
mutex_unlock(&pds_vfio->state_mutex);
mutex_unlock(&pds_vfio->reset_mutex);
}
void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
{
mutex_lock(&pds_vfio->reset_mutex);
pds_vfio->deferred_reset = true;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
if (!mutex_trylock(&pds_vfio->state_mutex)) {
mutex_unlock(&pds_vfio->reset_mutex);
return;
}
mutex_unlock(&pds_vfio->reset_mutex);
pds_vfio_state_mutex_unlock(pds_vfio);
} }
static struct file * static struct file *
...@@ -97,8 +74,7 @@ pds_vfio_set_device_state(struct vfio_device *vdev, ...@@ -97,8 +74,7 @@ pds_vfio_set_device_state(struct vfio_device *vdev,
break; break;
} }
} }
pds_vfio_state_mutex_unlock(pds_vfio); mutex_unlock(&pds_vfio->state_mutex);
/* still waiting on a deferred_reset */
if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR)
res = ERR_PTR(-EIO); res = ERR_PTR(-EIO);
...@@ -114,7 +90,7 @@ static int pds_vfio_get_device_state(struct vfio_device *vdev, ...@@ -114,7 +90,7 @@ static int pds_vfio_get_device_state(struct vfio_device *vdev,
mutex_lock(&pds_vfio->state_mutex); mutex_lock(&pds_vfio->state_mutex);
*current_state = pds_vfio->state; *current_state = pds_vfio->state;
pds_vfio_state_mutex_unlock(pds_vfio); mutex_unlock(&pds_vfio->state_mutex);
return 0; return 0;
} }
...@@ -156,7 +132,6 @@ static int pds_vfio_init_device(struct vfio_device *vdev) ...@@ -156,7 +132,6 @@ static int pds_vfio_init_device(struct vfio_device *vdev)
pds_vfio->vf_id = vf_id; pds_vfio->vf_id = vf_id;
mutex_init(&pds_vfio->state_mutex); mutex_init(&pds_vfio->state_mutex);
mutex_init(&pds_vfio->reset_mutex);
vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
vdev->mig_ops = &pds_vfio_lm_ops; vdev->mig_ops = &pds_vfio_lm_ops;
...@@ -178,7 +153,6 @@ static void pds_vfio_release_device(struct vfio_device *vdev) ...@@ -178,7 +153,6 @@ static void pds_vfio_release_device(struct vfio_device *vdev)
vfio_coredev.vdev); vfio_coredev.vdev);
mutex_destroy(&pds_vfio->state_mutex); mutex_destroy(&pds_vfio->state_mutex);
mutex_destroy(&pds_vfio->reset_mutex);
vfio_pci_core_release_dev(vdev); vfio_pci_core_release_dev(vdev);
} }
...@@ -194,7 +168,6 @@ static int pds_vfio_open_device(struct vfio_device *vdev) ...@@ -194,7 +168,6 @@ static int pds_vfio_open_device(struct vfio_device *vdev)
return err; return err;
pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev); vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);
......
...@@ -18,20 +18,16 @@ struct pds_vfio_pci_device { ...@@ -18,20 +18,16 @@ struct pds_vfio_pci_device {
struct pds_vfio_dirty dirty; struct pds_vfio_dirty dirty;
struct mutex state_mutex; /* protect migration state */ struct mutex state_mutex; /* protect migration state */
enum vfio_device_mig_state state; enum vfio_device_mig_state state;
struct mutex reset_mutex; /* protect reset_done flow */
u8 deferred_reset;
enum vfio_device_mig_state deferred_reset_state;
struct notifier_block nb; struct notifier_block nb;
int vf_id; int vf_id;
u16 client_id; u16 client_id;
}; };
void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio);
const struct vfio_device_ops *pds_vfio_ops_info(void); const struct vfio_device_ops *pds_vfio_ops_info(void);
struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev); struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev);
void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio); void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio,
enum vfio_device_mig_state state);
struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio); struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio);
struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio); struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio);
......
...@@ -1966,3 +1966,45 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, ...@@ -1966,3 +1966,45 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return done; return done;
} }
/**
* vfio_pci_core_range_intersect_range() - Determine overlap between a buffer
* and register offset ranges.
* @buf_start: start offset of the buffer
* @buf_cnt: number of buffer bytes
* @reg_start: start register offset
* @reg_cnt: number of register bytes
* @buf_offset: start offset of overlap in the buffer
* @intersect_count: number of overlapping bytes
* @register_offset: start offset of overlap in register
*
* Returns: true if there is overlap, false if not.
* The overlap start and size is returned through function args.
*/
bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
loff_t reg_start, size_t reg_cnt,
loff_t *buf_offset,
size_t *intersect_count,
size_t *register_offset)
{
if (buf_start <= reg_start &&
buf_start + buf_cnt > reg_start) {
*buf_offset = reg_start - buf_start;
*intersect_count = min_t(size_t, reg_cnt,
buf_start + buf_cnt - reg_start);
*register_offset = 0;
return true;
}
if (buf_start > reg_start &&
buf_start < reg_start + reg_cnt) {
*buf_offset = 0;
*intersect_count = min_t(size_t, buf_cnt,
reg_start + reg_cnt - buf_start);
*register_offset = buf_start - reg_start;
return true;
}
return false;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_range_intersect_range);
...@@ -2064,6 +2064,7 @@ static int vfio_pci_bus_notifier(struct notifier_block *nb, ...@@ -2064,6 +2064,7 @@ static int vfio_pci_bus_notifier(struct notifier_block *nb,
pci_name(pdev)); pci_name(pdev));
pdev->driver_override = kasprintf(GFP_KERNEL, "%s", pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
vdev->vdev.ops->name); vdev->vdev.ops->name);
WARN_ON(!pdev->driver_override);
} else if (action == BUS_NOTIFY_BOUND_DRIVER && } else if (action == BUS_NOTIFY_BOUND_DRIVER &&
pdev->is_virtfn && physfn == vdev->pdev) { pdev->is_virtfn && physfn == vdev->pdev) {
struct pci_driver *drv = pci_dev_driver(pdev); struct pci_driver *drv = pci_dev_driver(pdev);
......
...@@ -90,22 +90,28 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) ...@@ -90,22 +90,28 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
if (likely(is_intx(vdev) && !vdev->virq_disabled)) { if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
struct vfio_pci_irq_ctx *ctx; struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
ctx = vfio_irq_ctx_get(vdev, 0); ctx = vfio_irq_ctx_get(vdev, 0);
if (WARN_ON_ONCE(!ctx)) if (WARN_ON_ONCE(!ctx))
return; return;
eventfd_signal(ctx->trigger);
trigger = READ_ONCE(ctx->trigger);
if (likely(trigger))
eventfd_signal(trigger);
} }
} }
/* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */ /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) static bool __vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{ {
struct pci_dev *pdev = vdev->pdev; struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx; struct vfio_pci_irq_ctx *ctx;
unsigned long flags; unsigned long flags;
bool masked_changed = false; bool masked_changed = false;
lockdep_assert_held(&vdev->igate);
spin_lock_irqsave(&vdev->irqlock, flags); spin_lock_irqsave(&vdev->irqlock, flags);
/* /*
...@@ -143,6 +149,17 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) ...@@ -143,6 +149,17 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
return masked_changed; return masked_changed;
} }
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
bool mask_changed;
mutex_lock(&vdev->igate);
mask_changed = __vfio_pci_intx_mask(vdev);
mutex_unlock(&vdev->igate);
return mask_changed;
}
/* /*
* If this is triggered by an eventfd, we can't call eventfd_signal * If this is triggered by an eventfd, we can't call eventfd_signal
* or else we'll deadlock on the eventfd wait queue. Return >0 when * or else we'll deadlock on the eventfd wait queue. Return >0 when
...@@ -194,12 +211,21 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused) ...@@ -194,12 +211,21 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
return ret; return ret;
} }
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev) static void __vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{ {
lockdep_assert_held(&vdev->igate);
if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0) if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
vfio_send_intx_eventfd(vdev, NULL); vfio_send_intx_eventfd(vdev, NULL);
} }
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{
mutex_lock(&vdev->igate);
__vfio_pci_intx_unmask(vdev);
mutex_unlock(&vdev->igate);
}
static irqreturn_t vfio_intx_handler(int irq, void *dev_id) static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
{ {
struct vfio_pci_core_device *vdev = dev_id; struct vfio_pci_core_device *vdev = dev_id;
...@@ -231,97 +257,100 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id) ...@@ -231,97 +257,100 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
return ret; return ret;
} }
static int vfio_intx_enable(struct vfio_pci_core_device *vdev) static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
struct eventfd_ctx *trigger)
{ {
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx; struct vfio_pci_irq_ctx *ctx;
unsigned long irqflags;
char *name;
int ret;
if (!is_irq_none(vdev)) if (!is_irq_none(vdev))
return -EINVAL; return -EINVAL;
if (!vdev->pdev->irq) if (!pdev->irq)
return -ENODEV; return -ENODEV;
name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
if (!name)
return -ENOMEM;
ctx = vfio_irq_ctx_alloc(vdev, 0); ctx = vfio_irq_ctx_alloc(vdev, 0);
if (!ctx) if (!ctx)
return -ENOMEM; return -ENOMEM;
ctx->name = name;
ctx->trigger = trigger;
/* /*
* If the virtual interrupt is masked, restore it. Devices * Fill the initial masked state based on virq_disabled. After
* supporting DisINTx can be masked at the hardware level * enable, changing the DisINTx bit in vconfig directly changes INTx
* here, non-PCI-2.3 devices will have to wait until the * masking. igate prevents races during setup, once running masked
* interrupt is enabled. * is protected via irqlock.
*
* Devices supporting DisINTx also reflect the current mask state in
* the physical DisINTx bit, which is not affected during IRQ setup.
*
* Devices without DisINTx support require an exclusive interrupt.
* IRQ masking is performed at the IRQ chip. Again, igate protects
* against races during setup and IRQ handlers and irqfds are not
* yet active, therefore masked is stable and can be used to
* conditionally auto-enable the IRQ.
*
* irq_type must be stable while the IRQ handler is registered,
* therefore it must be set before request_irq().
*/ */
ctx->masked = vdev->virq_disabled; ctx->masked = vdev->virq_disabled;
if (vdev->pci_2_3) if (vdev->pci_2_3) {
pci_intx(vdev->pdev, !ctx->masked); pci_intx(pdev, !ctx->masked);
irqflags = IRQF_SHARED;
} else {
irqflags = ctx->masked ? IRQF_NO_AUTOEN : 0;
}
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX; vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
ret = request_irq(pdev->irq, vfio_intx_handler,
irqflags, ctx->name, vdev);
if (ret) {
vdev->irq_type = VFIO_PCI_NUM_IRQS;
kfree(name);
vfio_irq_ctx_free(vdev, ctx, 0);
return ret;
}
return 0; return 0;
} }
static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd) static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
struct eventfd_ctx *trigger)
{ {
struct pci_dev *pdev = vdev->pdev; struct pci_dev *pdev = vdev->pdev;
unsigned long irqflags = IRQF_SHARED;
struct vfio_pci_irq_ctx *ctx; struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger; struct eventfd_ctx *old;
unsigned long flags;
int ret;
ctx = vfio_irq_ctx_get(vdev, 0); ctx = vfio_irq_ctx_get(vdev, 0);
if (WARN_ON_ONCE(!ctx)) if (WARN_ON_ONCE(!ctx))
return -EINVAL; return -EINVAL;
if (ctx->trigger) { old = ctx->trigger;
free_irq(pdev->irq, vdev);
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
ctx->trigger = NULL;
}
if (fd < 0) /* Disable only */
return 0;
ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", WRITE_ONCE(ctx->trigger, trigger);
pci_name(pdev));
if (!ctx->name)
return -ENOMEM;
trigger = eventfd_ctx_fdget(fd); /* Releasing an old ctx requires synchronizing in-flight users */
if (IS_ERR(trigger)) { if (old) {
kfree(ctx->name); synchronize_irq(pdev->irq);
return PTR_ERR(trigger); vfio_virqfd_flush_thread(&ctx->unmask);
eventfd_ctx_put(old);
} }
ctx->trigger = trigger;
if (!vdev->pci_2_3)
irqflags = 0;
ret = request_irq(pdev->irq, vfio_intx_handler,
irqflags, ctx->name, vdev);
if (ret) {
ctx->trigger = NULL;
kfree(ctx->name);
eventfd_ctx_put(trigger);
return ret;
}
/*
* INTx disable will stick across the new irq setup,
* disable_irq won't.
*/
spin_lock_irqsave(&vdev->irqlock, flags);
if (!vdev->pci_2_3 && ctx->masked)
disable_irq_nosync(pdev->irq);
spin_unlock_irqrestore(&vdev->irqlock, flags);
return 0; return 0;
} }
static void vfio_intx_disable(struct vfio_pci_core_device *vdev) static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{ {
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx; struct vfio_pci_irq_ctx *ctx;
ctx = vfio_irq_ctx_get(vdev, 0); ctx = vfio_irq_ctx_get(vdev, 0);
...@@ -329,10 +358,13 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) ...@@ -329,10 +358,13 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
if (ctx) { if (ctx) {
vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask); vfio_virqfd_disable(&ctx->mask);
free_irq(pdev->irq, vdev);
if (ctx->trigger)
eventfd_ctx_put(ctx->trigger);
kfree(ctx->name);
vfio_irq_ctx_free(vdev, ctx, 0);
} }
vfio_intx_set_signal(vdev, -1);
vdev->irq_type = VFIO_PCI_NUM_IRQS; vdev->irq_type = VFIO_PCI_NUM_IRQS;
vfio_irq_ctx_free(vdev, ctx, 0);
} }
/* /*
...@@ -560,11 +592,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev, ...@@ -560,11 +592,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
return -EINVAL; return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_NONE) { if (flags & VFIO_IRQ_SET_DATA_NONE) {
vfio_pci_intx_unmask(vdev); __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) { } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t unmask = *(uint8_t *)data; uint8_t unmask = *(uint8_t *)data;
if (unmask) if (unmask)
vfio_pci_intx_unmask(vdev); __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0); struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
int32_t fd = *(int32_t *)data; int32_t fd = *(int32_t *)data;
...@@ -591,11 +623,11 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev, ...@@ -591,11 +623,11 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
return -EINVAL; return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_NONE) { if (flags & VFIO_IRQ_SET_DATA_NONE) {
vfio_pci_intx_mask(vdev); __vfio_pci_intx_mask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) { } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t mask = *(uint8_t *)data; uint8_t mask = *(uint8_t *)data;
if (mask) if (mask)
vfio_pci_intx_mask(vdev); __vfio_pci_intx_mask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
return -ENOTTY; /* XXX implement me */ return -ENOTTY; /* XXX implement me */
} }
...@@ -616,19 +648,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, ...@@ -616,19 +648,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
return -EINVAL; return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
struct eventfd_ctx *trigger = NULL;
int32_t fd = *(int32_t *)data; int32_t fd = *(int32_t *)data;
int ret; int ret;
if (is_intx(vdev)) if (fd >= 0) {
return vfio_intx_set_signal(vdev, fd); trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger))
return PTR_ERR(trigger);
}
ret = vfio_intx_enable(vdev); if (is_intx(vdev))
if (ret) ret = vfio_intx_set_signal(vdev, trigger);
return ret; else
ret = vfio_intx_enable(vdev, trigger);
ret = vfio_intx_set_signal(vdev, fd); if (ret && trigger)
if (ret) eventfd_ctx_put(trigger);
vfio_intx_disable(vdev);
return ret; return ret;
} }
......
...@@ -96,10 +96,10 @@ VFIO_IOREAD(32) ...@@ -96,10 +96,10 @@ VFIO_IOREAD(32)
* reads with -1. This is intended for handling MSI-X vector tables and * reads with -1. This is intended for handling MSI-X vector tables and
* leftover space for ROM BARs. * leftover space for ROM BARs.
*/ */
static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
void __iomem *io, char __user *buf, void __iomem *io, char __user *buf,
loff_t off, size_t count, size_t x_start, loff_t off, size_t count, size_t x_start,
size_t x_end, bool iswrite) size_t x_end, bool iswrite)
{ {
ssize_t done = 0; ssize_t done = 0;
int ret; int ret;
...@@ -201,6 +201,7 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, ...@@ -201,6 +201,7 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
return done; return done;
} }
EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
{ {
...@@ -279,8 +280,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, ...@@ -279,8 +280,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
x_end = vdev->msix_offset + vdev->msix_size; x_end = vdev->msix_offset + vdev->msix_size;
} }
done = do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos, done = vfio_pci_core_do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos,
count, x_start, x_end, iswrite); count, x_start, x_end, iswrite);
if (done >= 0) if (done >= 0)
*ppos += done; *ppos += done;
...@@ -348,7 +349,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, ...@@ -348,7 +349,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
* probing, so we don't currently worry about access in relation * probing, so we don't currently worry about access in relation
* to the memory enable bit in the command register. * to the memory enable bit in the command register.
*/ */
done = do_io_rw(vdev, false, iomem, buf, off, count, 0, 0, iswrite); done = vfio_pci_core_do_io_rw(vdev, false, iomem, buf, off, count,
0, 0, iswrite);
vga_put(vdev->pdev, rsrc); vga_put(vdev->pdev, rsrc);
......
...@@ -132,33 +132,6 @@ virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device *virtvdev, ...@@ -132,33 +132,6 @@ virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device *virtvdev,
return ret ? ret : count; return ret ? ret : count;
} }
static bool range_intersect_range(loff_t range1_start, size_t count1,
loff_t range2_start, size_t count2,
loff_t *start_offset,
size_t *intersect_count,
size_t *register_offset)
{
if (range1_start <= range2_start &&
range1_start + count1 > range2_start) {
*start_offset = range2_start - range1_start;
*intersect_count = min_t(size_t, count2,
range1_start + count1 - range2_start);
*register_offset = 0;
return true;
}
if (range1_start > range2_start &&
range1_start < range2_start + count2) {
*start_offset = 0;
*intersect_count = min_t(size_t, count1,
range2_start + count2 - range1_start);
*register_offset = range1_start - range2_start;
return true;
}
return false;
}
static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
char __user *buf, size_t count, char __user *buf, size_t count,
loff_t *ppos) loff_t *ppos)
...@@ -178,16 +151,18 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, ...@@ -178,16 +151,18 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
if (ret < 0) if (ret < 0)
return ret; return ret;
if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16), if (vfio_pci_core_range_intersect_range(pos, count, PCI_DEVICE_ID,
&copy_offset, &copy_count, &register_offset)) { sizeof(val16), &copy_offset,
&copy_count, &register_offset)) {
val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET); val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count)) if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
return -EFAULT; return -EFAULT;
} }
if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) && if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16), vfio_pci_core_range_intersect_range(pos, count, PCI_COMMAND,
&copy_offset, &copy_count, &register_offset)) { sizeof(val16), &copy_offset,
&copy_count, &register_offset)) {
if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset, if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
copy_count)) copy_count))
return -EFAULT; return -EFAULT;
...@@ -197,16 +172,18 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, ...@@ -197,16 +172,18 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
return -EFAULT; return -EFAULT;
} }
if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8), if (vfio_pci_core_range_intersect_range(pos, count, PCI_REVISION_ID,
&copy_offset, &copy_count, &register_offset)) { sizeof(val8), &copy_offset,
&copy_count, &register_offset)) {
/* Transional needs to have revision 0 */ /* Transional needs to have revision 0 */
val8 = 0; val8 = 0;
if (copy_to_user(buf + copy_offset, &val8, copy_count)) if (copy_to_user(buf + copy_offset, &val8, copy_count))
return -EFAULT; return -EFAULT;
} }
if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32), if (vfio_pci_core_range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
&copy_offset, &copy_count, &register_offset)) { sizeof(val32), &copy_offset,
&copy_count, &register_offset)) {
u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1); u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0); u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
...@@ -215,8 +192,9 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, ...@@ -215,8 +192,9 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
return -EFAULT; return -EFAULT;
} }
if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16), if (vfio_pci_core_range_intersect_range(pos, count, PCI_SUBSYSTEM_ID,
&copy_offset, &copy_count, &register_offset)) { sizeof(val16), &copy_offset,
&copy_count, &register_offset)) {
/* /*
* Transitional devices use the PCI subsystem device id as * Transitional devices use the PCI subsystem device id as
* virtio device id, same as legacy driver always did. * virtio device id, same as legacy driver always did.
...@@ -227,8 +205,9 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, ...@@ -227,8 +205,9 @@ static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
return -EFAULT; return -EFAULT;
} }
if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16), if (vfio_pci_core_range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID,
&copy_offset, &copy_count, &register_offset)) { sizeof(val16), &copy_offset,
&copy_count, &register_offset)) {
val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET); val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
copy_count)) copy_count))
...@@ -270,19 +249,20 @@ static ssize_t virtiovf_pci_write_config(struct vfio_device *core_vdev, ...@@ -270,19 +249,20 @@ static ssize_t virtiovf_pci_write_config(struct vfio_device *core_vdev,
loff_t copy_offset; loff_t copy_offset;
size_t copy_count; size_t copy_count;
if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd), if (vfio_pci_core_range_intersect_range(pos, count, PCI_COMMAND,
&copy_offset, &copy_count, sizeof(virtvdev->pci_cmd),
&register_offset)) { &copy_offset, &copy_count,
&register_offset)) {
if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset, if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
buf + copy_offset, buf + copy_offset,
copy_count)) copy_count))
return -EFAULT; return -EFAULT;
} }
if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, if (vfio_pci_core_range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
sizeof(virtvdev->pci_base_addr_0), sizeof(virtvdev->pci_base_addr_0),
&copy_offset, &copy_count, &copy_offset, &copy_count,
&register_offset)) { &register_offset)) {
if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset, if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
buf + copy_offset, buf + copy_offset,
copy_count)) copy_count))
......
...@@ -122,16 +122,16 @@ static const struct vfio_device_ops vfio_amba_ops = { ...@@ -122,16 +122,16 @@ static const struct vfio_device_ops vfio_amba_ops = {
.detach_ioas = vfio_iommufd_physical_detach_ioas, .detach_ioas = vfio_iommufd_physical_detach_ioas,
}; };
static const struct amba_id pl330_ids[] = { static const struct amba_id vfio_amba_ids[] = {
{ 0, 0 }, { 0, 0 },
}; };
MODULE_DEVICE_TABLE(amba, pl330_ids); MODULE_DEVICE_TABLE(amba, vfio_amba_ids);
static struct amba_driver vfio_amba_driver = { static struct amba_driver vfio_amba_driver = {
.probe = vfio_amba_probe, .probe = vfio_amba_probe,
.remove = vfio_amba_remove, .remove = vfio_amba_remove,
.id_table = pl330_ids, .id_table = vfio_amba_ids,
.drv = { .drv = {
.name = "vfio-amba", .name = "vfio-amba",
.owner = THIS_MODULE, .owner = THIS_MODULE,
......
...@@ -85,14 +85,13 @@ static void vfio_platform_release_dev(struct vfio_device *core_vdev) ...@@ -85,14 +85,13 @@ static void vfio_platform_release_dev(struct vfio_device *core_vdev)
vfio_platform_release_common(vdev); vfio_platform_release_common(vdev);
} }
static int vfio_platform_remove(struct platform_device *pdev) static void vfio_platform_remove(struct platform_device *pdev)
{ {
struct vfio_platform_device *vdev = dev_get_drvdata(&pdev->dev); struct vfio_platform_device *vdev = dev_get_drvdata(&pdev->dev);
vfio_unregister_group_dev(&vdev->vdev); vfio_unregister_group_dev(&vdev->vdev);
pm_runtime_disable(vdev->device); pm_runtime_disable(vdev->device);
vfio_put_device(&vdev->vdev); vfio_put_device(&vdev->vdev);
return 0;
} }
static const struct vfio_device_ops vfio_platform_ops = { static const struct vfio_device_ops vfio_platform_ops = {
...@@ -113,7 +112,7 @@ static const struct vfio_device_ops vfio_platform_ops = { ...@@ -113,7 +112,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
static struct platform_driver vfio_platform_driver = { static struct platform_driver vfio_platform_driver = {
.probe = vfio_platform_probe, .probe = vfio_platform_probe,
.remove = vfio_platform_remove, .remove_new = vfio_platform_remove,
.driver = { .driver = {
.name = "vfio-platform", .name = "vfio-platform",
}, },
......
...@@ -136,6 +136,16 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev, ...@@ -136,6 +136,16 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
return 0; return 0;
} }
/*
* The trigger eventfd is guaranteed valid in the interrupt path
* and protected by the igate mutex when triggered via ioctl.
*/
static void vfio_send_eventfd(struct vfio_platform_irq *irq_ctx)
{
if (likely(irq_ctx->trigger))
eventfd_signal(irq_ctx->trigger);
}
static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
{ {
struct vfio_platform_irq *irq_ctx = dev_id; struct vfio_platform_irq *irq_ctx = dev_id;
...@@ -155,7 +165,7 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) ...@@ -155,7 +165,7 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
spin_unlock_irqrestore(&irq_ctx->lock, flags); spin_unlock_irqrestore(&irq_ctx->lock, flags);
if (ret == IRQ_HANDLED) if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx->trigger); vfio_send_eventfd(irq_ctx);
return ret; return ret;
} }
...@@ -164,52 +174,40 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id) ...@@ -164,52 +174,40 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
{ {
struct vfio_platform_irq *irq_ctx = dev_id; struct vfio_platform_irq *irq_ctx = dev_id;
eventfd_signal(irq_ctx->trigger); vfio_send_eventfd(irq_ctx);
return IRQ_HANDLED; return IRQ_HANDLED;
} }
static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler) int fd)
{ {
struct vfio_platform_irq *irq = &vdev->irqs[index]; struct vfio_platform_irq *irq = &vdev->irqs[index];
struct eventfd_ctx *trigger; struct eventfd_ctx *trigger;
int ret;
if (irq->trigger) { if (irq->trigger) {
irq_clear_status_flags(irq->hwirq, IRQ_NOAUTOEN); disable_irq(irq->hwirq);
free_irq(irq->hwirq, irq);
kfree(irq->name);
eventfd_ctx_put(irq->trigger); eventfd_ctx_put(irq->trigger);
irq->trigger = NULL; irq->trigger = NULL;
} }
if (fd < 0) /* Disable only */ if (fd < 0) /* Disable only */
return 0; return 0;
irq->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-irq[%d](%s)",
irq->hwirq, vdev->name);
if (!irq->name)
return -ENOMEM;
trigger = eventfd_ctx_fdget(fd); trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) { if (IS_ERR(trigger))
kfree(irq->name);
return PTR_ERR(trigger); return PTR_ERR(trigger);
}
irq->trigger = trigger; irq->trigger = trigger;
irq_set_status_flags(irq->hwirq, IRQ_NOAUTOEN); /*
ret = request_irq(irq->hwirq, handler, 0, irq->name, irq); * irq->masked effectively provides nested disables within the overall
if (ret) { * enable relative to trigger. Specifically request_irq() is called
kfree(irq->name); * with NO_AUTOEN, therefore the IRQ is initially disabled. The user
eventfd_ctx_put(trigger); * may only further disable the IRQ with a MASK operations because
irq->trigger = NULL; * irq->masked is initially false.
return ret; */
} enable_irq(irq->hwirq);
if (!irq->masked)
enable_irq(irq->hwirq);
return 0; return 0;
} }
...@@ -228,7 +226,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, ...@@ -228,7 +226,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
handler = vfio_irq_handler; handler = vfio_irq_handler;
if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) if (!count && (flags & VFIO_IRQ_SET_DATA_NONE))
return vfio_set_trigger(vdev, index, -1, handler); return vfio_set_trigger(vdev, index, -1);
if (start != 0 || count != 1) if (start != 0 || count != 1)
return -EINVAL; return -EINVAL;
...@@ -236,7 +234,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, ...@@ -236,7 +234,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
int32_t fd = *(int32_t *)data; int32_t fd = *(int32_t *)data;
return vfio_set_trigger(vdev, index, fd, handler); return vfio_set_trigger(vdev, index, fd);
} }
if (flags & VFIO_IRQ_SET_DATA_NONE) { if (flags & VFIO_IRQ_SET_DATA_NONE) {
...@@ -260,6 +258,14 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev, ...@@ -260,6 +258,14 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
unsigned start, unsigned count, uint32_t flags, unsigned start, unsigned count, uint32_t flags,
void *data) = NULL; void *data) = NULL;
/*
* For compatibility, errors from request_irq() are local to the
* SET_IRQS path and reflected in the name pointer. This allows,
* for example, polling mode fallback for an exclusive IRQ failure.
*/
if (IS_ERR(vdev->irqs[index].name))
return PTR_ERR(vdev->irqs[index].name);
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_MASK: case VFIO_IRQ_SET_ACTION_MASK:
func = vfio_platform_set_irq_mask; func = vfio_platform_set_irq_mask;
...@@ -280,7 +286,7 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev, ...@@ -280,7 +286,7 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
int vfio_platform_irq_init(struct vfio_platform_device *vdev) int vfio_platform_irq_init(struct vfio_platform_device *vdev)
{ {
int cnt = 0, i; int cnt = 0, i, ret = 0;
while (vdev->get_irq(vdev, cnt) >= 0) while (vdev->get_irq(vdev, cnt) >= 0)
cnt++; cnt++;
...@@ -292,37 +298,70 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev) ...@@ -292,37 +298,70 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
for (i = 0; i < cnt; i++) { for (i = 0; i < cnt; i++) {
int hwirq = vdev->get_irq(vdev, i); int hwirq = vdev->get_irq(vdev, i);
irq_handler_t handler = vfio_irq_handler;
if (hwirq < 0) if (hwirq < 0) {
ret = -EINVAL;
goto err; goto err;
}
spin_lock_init(&vdev->irqs[i].lock); spin_lock_init(&vdev->irqs[i].lock);
vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD; vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK) if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK) {
vdev->irqs[i].flags |= VFIO_IRQ_INFO_MASKABLE vdev->irqs[i].flags |= VFIO_IRQ_INFO_MASKABLE
| VFIO_IRQ_INFO_AUTOMASKED; | VFIO_IRQ_INFO_AUTOMASKED;
handler = vfio_automasked_irq_handler;
}
vdev->irqs[i].count = 1; vdev->irqs[i].count = 1;
vdev->irqs[i].hwirq = hwirq; vdev->irqs[i].hwirq = hwirq;
vdev->irqs[i].masked = false; vdev->irqs[i].masked = false;
vdev->irqs[i].name = kasprintf(GFP_KERNEL_ACCOUNT,
"vfio-irq[%d](%s)", hwirq,
vdev->name);
if (!vdev->irqs[i].name) {
ret = -ENOMEM;
goto err;
}
ret = request_irq(hwirq, handler, IRQF_NO_AUTOEN,
vdev->irqs[i].name, &vdev->irqs[i]);
if (ret) {
kfree(vdev->irqs[i].name);
vdev->irqs[i].name = ERR_PTR(ret);
}
} }
vdev->num_irqs = cnt; vdev->num_irqs = cnt;
return 0; return 0;
err: err:
for (--i; i >= 0; i--) {
if (!IS_ERR(vdev->irqs[i].name)) {
free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
kfree(vdev->irqs[i].name);
}
}
kfree(vdev->irqs); kfree(vdev->irqs);
return -EINVAL; return ret;
} }
void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev) void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
{ {
int i; int i;
for (i = 0; i < vdev->num_irqs; i++) for (i = 0; i < vdev->num_irqs; i++) {
vfio_set_trigger(vdev, i, -1, NULL); vfio_virqfd_disable(&vdev->irqs[i].mask);
vfio_virqfd_disable(&vdev->irqs[i].unmask);
if (!IS_ERR(vdev->irqs[i].name)) {
free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
if (vdev->irqs[i].trigger)
eventfd_ctx_put(vdev->irqs[i].trigger);
kfree(vdev->irqs[i].name);
}
}
vdev->num_irqs = 0; vdev->num_irqs = 0;
kfree(vdev->irqs); kfree(vdev->irqs);
......
...@@ -567,18 +567,6 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, ...@@ -567,18 +567,6 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM, ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL); pages, NULL);
if (ret > 0) { if (ret > 0) {
int i;
/*
* The zero page is always resident, we don't need to pin it
* and it falls into our invalid/reserved test so we don't
* unpin in put_pfn(). Unpin all zero pages in the batch here.
*/
for (i = 0 ; i < ret; i++) {
if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
unpin_user_page(pages[i]);
}
*pfn = page_to_pfn(pages[0]); *pfn = page_to_pfn(pages[0]);
goto done; goto done;
} }
......
...@@ -101,6 +101,13 @@ static void virqfd_inject(struct work_struct *work) ...@@ -101,6 +101,13 @@ static void virqfd_inject(struct work_struct *work)
virqfd->thread(virqfd->opaque, virqfd->data); virqfd->thread(virqfd->opaque, virqfd->data);
} }
static void virqfd_flush_inject(struct work_struct *work)
{
struct virqfd *virqfd = container_of(work, struct virqfd, flush_inject);
flush_work(&virqfd->inject);
}
int vfio_virqfd_enable(void *opaque, int vfio_virqfd_enable(void *opaque,
int (*handler)(void *, void *), int (*handler)(void *, void *),
void (*thread)(void *, void *), void (*thread)(void *, void *),
...@@ -124,6 +131,7 @@ int vfio_virqfd_enable(void *opaque, ...@@ -124,6 +131,7 @@ int vfio_virqfd_enable(void *opaque,
INIT_WORK(&virqfd->shutdown, virqfd_shutdown); INIT_WORK(&virqfd->shutdown, virqfd_shutdown);
INIT_WORK(&virqfd->inject, virqfd_inject); INIT_WORK(&virqfd->inject, virqfd_inject);
INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
irqfd = fdget(fd); irqfd = fdget(fd);
if (!irqfd.file) { if (!irqfd.file) {
...@@ -213,3 +221,16 @@ void vfio_virqfd_disable(struct virqfd **pvirqfd) ...@@ -213,3 +221,16 @@ void vfio_virqfd_disable(struct virqfd **pvirqfd)
flush_workqueue(vfio_irqfd_cleanup_wq); flush_workqueue(vfio_irqfd_cleanup_wq);
} }
EXPORT_SYMBOL_GPL(vfio_virqfd_disable); EXPORT_SYMBOL_GPL(vfio_virqfd_disable);
void vfio_virqfd_flush_thread(struct virqfd **pvirqfd)
{
unsigned long flags;
spin_lock_irqsave(&virqfd_lock, flags);
if (*pvirqfd && (*pvirqfd)->thread)
queue_work(vfio_irqfd_cleanup_wq, &(*pvirqfd)->flush_inject);
spin_unlock_irqrestore(&virqfd_lock, flags);
flush_workqueue(vfio_irqfd_cleanup_wq);
}
EXPORT_SYMBOL_GPL(vfio_virqfd_flush_thread);
...@@ -12677,6 +12677,11 @@ struct mlx5_ifc_modify_page_track_obj_in_bits { ...@@ -12677,6 +12677,11 @@ struct mlx5_ifc_modify_page_track_obj_in_bits {
struct mlx5_ifc_page_track_bits obj_context; struct mlx5_ifc_page_track_bits obj_context;
}; };
struct mlx5_ifc_query_page_track_obj_out_bits {
struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr;
struct mlx5_ifc_page_track_bits obj_context;
};
struct mlx5_ifc_msecq_reg_bits { struct mlx5_ifc_msecq_reg_bits {
u8 reserved_at_0[0x20]; u8 reserved_at_0[0x20];
......
...@@ -356,6 +356,7 @@ struct virqfd { ...@@ -356,6 +356,7 @@ struct virqfd {
wait_queue_entry_t wait; wait_queue_entry_t wait;
poll_table pt; poll_table pt;
struct work_struct shutdown; struct work_struct shutdown;
struct work_struct flush_inject;
struct virqfd **pvirqfd; struct virqfd **pvirqfd;
}; };
...@@ -363,5 +364,6 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), ...@@ -363,5 +364,6 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *),
void (*thread)(void *, void *), void *data, void (*thread)(void *, void *), void *data,
struct virqfd **pvirqfd, int fd); struct virqfd **pvirqfd, int fd);
void vfio_virqfd_disable(struct virqfd **pvirqfd); void vfio_virqfd_disable(struct virqfd **pvirqfd);
void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
#endif /* VFIO_H */ #endif /* VFIO_H */
...@@ -130,7 +130,15 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); ...@@ -130,7 +130,15 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar); int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state); pci_channel_state_t state);
ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
void __iomem *io, char __user *buf,
loff_t off, size_t count, size_t x_start,
size_t x_end, bool iswrite);
bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
loff_t reg_start, size_t reg_cnt,
loff_t *buf_offset,
size_t *intersect_count,
size_t *register_offset);
#define VFIO_IOWRITE_DECLATION(size) \ #define VFIO_IOWRITE_DECLATION(size) \
int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \ int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size val, void __iomem *io); bool test_mem, u##size val, void __iomem *io);
......
...@@ -133,7 +133,9 @@ static struct mdev_type *mbochs_mdev_types[] = { ...@@ -133,7 +133,9 @@ static struct mdev_type *mbochs_mdev_types[] = {
}; };
static dev_t mbochs_devt; static dev_t mbochs_devt;
static struct class *mbochs_class; static const struct class mbochs_class = {
.name = MBOCHS_CLASS_NAME,
};
static struct cdev mbochs_cdev; static struct cdev mbochs_cdev;
static struct device mbochs_dev; static struct device mbochs_dev;
static struct mdev_parent mbochs_parent; static struct mdev_parent mbochs_parent;
...@@ -1422,13 +1424,10 @@ static int __init mbochs_dev_init(void) ...@@ -1422,13 +1424,10 @@ static int __init mbochs_dev_init(void)
if (ret) if (ret)
goto err_cdev; goto err_cdev;
mbochs_class = class_create(MBOCHS_CLASS_NAME); ret = class_register(&mbochs_class);
if (IS_ERR(mbochs_class)) { if (ret)
pr_err("Error: failed to register mbochs_dev class\n");
ret = PTR_ERR(mbochs_class);
goto err_driver; goto err_driver;
} mbochs_dev.class = &mbochs_class;
mbochs_dev.class = mbochs_class;
mbochs_dev.release = mbochs_device_release; mbochs_dev.release = mbochs_device_release;
dev_set_name(&mbochs_dev, "%s", MBOCHS_NAME); dev_set_name(&mbochs_dev, "%s", MBOCHS_NAME);
...@@ -1448,7 +1447,7 @@ static int __init mbochs_dev_init(void) ...@@ -1448,7 +1447,7 @@ static int __init mbochs_dev_init(void)
device_del(&mbochs_dev); device_del(&mbochs_dev);
err_put: err_put:
put_device(&mbochs_dev); put_device(&mbochs_dev);
class_destroy(mbochs_class); class_unregister(&mbochs_class);
err_driver: err_driver:
mdev_unregister_driver(&mbochs_driver); mdev_unregister_driver(&mbochs_driver);
err_cdev: err_cdev:
...@@ -1466,8 +1465,7 @@ static void __exit mbochs_dev_exit(void) ...@@ -1466,8 +1465,7 @@ static void __exit mbochs_dev_exit(void)
mdev_unregister_driver(&mbochs_driver); mdev_unregister_driver(&mbochs_driver);
cdev_del(&mbochs_cdev); cdev_del(&mbochs_cdev);
unregister_chrdev_region(mbochs_devt, MINORMASK + 1); unregister_chrdev_region(mbochs_devt, MINORMASK + 1);
class_destroy(mbochs_class); class_unregister(&mbochs_class);
mbochs_class = NULL;
} }
MODULE_IMPORT_NS(DMA_BUF); MODULE_IMPORT_NS(DMA_BUF);
......
...@@ -84,7 +84,9 @@ static struct mdev_type *mdpy_mdev_types[] = { ...@@ -84,7 +84,9 @@ static struct mdev_type *mdpy_mdev_types[] = {
}; };
static dev_t mdpy_devt; static dev_t mdpy_devt;
static struct class *mdpy_class; static const struct class mdpy_class = {
.name = MDPY_CLASS_NAME,
};
static struct cdev mdpy_cdev; static struct cdev mdpy_cdev;
static struct device mdpy_dev; static struct device mdpy_dev;
static struct mdev_parent mdpy_parent; static struct mdev_parent mdpy_parent;
...@@ -709,13 +711,10 @@ static int __init mdpy_dev_init(void) ...@@ -709,13 +711,10 @@ static int __init mdpy_dev_init(void)
if (ret) if (ret)
goto err_cdev; goto err_cdev;
mdpy_class = class_create(MDPY_CLASS_NAME); ret = class_register(&mdpy_class);
if (IS_ERR(mdpy_class)) { if (ret)
pr_err("Error: failed to register mdpy_dev class\n");
ret = PTR_ERR(mdpy_class);
goto err_driver; goto err_driver;
} mdpy_dev.class = &mdpy_class;
mdpy_dev.class = mdpy_class;
mdpy_dev.release = mdpy_device_release; mdpy_dev.release = mdpy_device_release;
dev_set_name(&mdpy_dev, "%s", MDPY_NAME); dev_set_name(&mdpy_dev, "%s", MDPY_NAME);
...@@ -735,7 +734,7 @@ static int __init mdpy_dev_init(void) ...@@ -735,7 +734,7 @@ static int __init mdpy_dev_init(void)
device_del(&mdpy_dev); device_del(&mdpy_dev);
err_put: err_put:
put_device(&mdpy_dev); put_device(&mdpy_dev);
class_destroy(mdpy_class); class_unregister(&mdpy_class);
err_driver: err_driver:
mdev_unregister_driver(&mdpy_driver); mdev_unregister_driver(&mdpy_driver);
err_cdev: err_cdev:
...@@ -753,8 +752,7 @@ static void __exit mdpy_dev_exit(void) ...@@ -753,8 +752,7 @@ static void __exit mdpy_dev_exit(void)
mdev_unregister_driver(&mdpy_driver); mdev_unregister_driver(&mdpy_driver);
cdev_del(&mdpy_cdev); cdev_del(&mdpy_cdev);
unregister_chrdev_region(mdpy_devt, MINORMASK + 1); unregister_chrdev_region(mdpy_devt, MINORMASK + 1);
class_destroy(mdpy_class); class_unregister(&mdpy_class);
mdpy_class = NULL;
} }
module_param_named(count, mdpy_driver.max_instances, int, 0444); module_param_named(count, mdpy_driver.max_instances, int, 0444);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment