Commit 1a65c39e authored by Jason Gunthorpe's avatar Jason Gunthorpe

Merge patch series "IOMMUFD Generic interface"

Jason Gunthorpe <jgg@nvidia.com> says:

==================
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.

It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.

We see a broad need for extended features, some being highly IOMMU device
specific:
 - Binding iommu_domain's to PASID/SSID
 - Userspace IO page tables, for ARM, x86 and S390
 - Kernel bypassed invalidation of user page tables
 - Re-use of the KVM page table in the IOMMU
 - Dirty page tracking in the IOMMU
 - Runtime Increase/Decrease of IOPTE size
 - PRI support with faults resolved in userspace

Many of these HW features exist to support VM use cases - for instance the
combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a
guest. Dirty tracking enables VM live migration with SRIOV devices and
PASID support allow creating "scalable IOV" devices, among other things.

As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs, which
is currently VFIO and VDPA.

The pre-v1 series proposed re-using the VFIO type 1 data structure,
however it was suggested that if we are doing this big update then we
should also come with an improved data structure that solves the
limitations that VFIO type1 has. Notably this addresses:

 - Multiple IOAS/'containers' and multiple domains inside a single FD

 - Single-pin operation no matter how many domains and containers use
   a page

 - A fine grained locking scheme supporting user managed concurrency for
   multi-threaded map/unmap

 - A pre-registration mechanism to optimize vIOMMU use cases by
   pre-pinning pages

 - Extended ioctl API that can manage these new objects and exposes
   domains directly to user space

 - domains are sharable between subsystems, eg VFIO and VDPA

The bulk of this code is a new data structure design to track how the
IOVAs are mapped to PFNs.

iommufd intends to be general and consumable by any driver that wants to
DMA to userspace. From a driver perspective it can largely be dropped in
in-place of iommu_attach_device() and provides a uniform full feature set
to all consumers.

As this is a larger project this series is the first step. This series
provides the iommfd "generic interface" which is designed to be suitable
for applications like DPDK and VMM flows that are not optimized to
specific HW scenarios. It is close to being a drop in replacement for the
existing VFIO type 1 and supports existing qemu based VM flows.

Several follow-on series are being prepared:

- Patches integrating with qemu in native mode:
  https://github.com/yiliu1765/qemu/commits/qemu-iommufd-6.0-rc2

- A completed integration with VFIO now exists that covers "emulated" mdev
  use cases now, and can pass testing with qemu/etc in compatability mode:
  https://github.com/jgunthorpe/linux/commits/vfio_iommufd

- A draft providing system iommu dirty tracking on top of iommufd,
  including iommu driver implementations:
  https://github.com/jpemartins/linux/commits/x86-iommufd

  This pairs with patches for providing a similar API to support VFIO-device
  tracking to give a complete vfio solution:
  https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@nvidia.com/

- Userspace page tables aka 'nested translation' for ARM and Intel iommu
  drivers:
  https://github.com/nicolinc/iommufd/commits/iommufd_nesting

- "device centric" vfio series to expose the vfio_device FD directly as a
  normal cdev, and provide an extended API allowing dynamically changing
  the IOAS binding:
  https://github.com/yiliu1765/iommufd/commits/iommufd-v6.0-rc2-nesting-0901

- Drafts for PASID and PRI interfaces are included above as well

Overall enough work is done now to show the merit of the new API design
and at least draft solutions to many of the main problems.

Several people have contributed directly to this work: Eric Auger, Joao
Martins, Kevin Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have
participated in the discussions that lead here, and provided ideas. Thanks
to all!

The v1/v2 iommufd series has been used to guide a large amount of preparatory
work that has now been merged. The general theme is to organize things in
a way that makes injecting iommufd natural:

 - VFIO live migration support with mlx5 and hisi_acc drivers.
   These series need a dirty tracking solution to be really usable.
   https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/
   https://lore.kernel.org/kvm/20220308184902.2242-1-shameerali.kolothum.thodi@huawei.com/

 - Significantly rework the VFIO gvt mdev and remove struct
   mdev_parent_ops
   https://lore.kernel.org/lkml/20220411141403.86980-1-hch@lst.de/

 - Rework how PCIe no-snoop blocking works
   https://lore.kernel.org/kvm/0-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com/

 - Consolidate dma ownership into the iommu core code
   https://lore.kernel.org/linux-iommu/20220418005000.897664-1-baolu.lu@linux.intel.com/

 - Make all vfio driver interfaces use struct vfio_device consistently
   https://lore.kernel.org/kvm/0-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com/

 - Remove the vfio_group from the kvm/vfio interface
   https://lore.kernel.org/kvm/0-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com/

 - Simplify locking in vfio
   https://lore.kernel.org/kvm/0-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com/

 - Remove the vfio notifiter scheme that faces drivers
   https://lore.kernel.org/kvm/0-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com/

 - Improve the driver facing API for vfio pin/unpin pages to make the
   presence of struct page clear
   https://lore.kernel.org/kvm/20220723020256.30081-1-nicolinc@nvidia.com/

 - Clean up in the Intel IOMMU driver
   https://lore.kernel.org/linux-iommu/20220301020159.633356-1-baolu.lu@linux.intel.com/
   https://lore.kernel.org/linux-iommu/20220510023407.2759143-1-baolu.lu@linux.intel.com/
   https://lore.kernel.org/linux-iommu/20220514014322.2927339-1-baolu.lu@linux.intel.com/
   https://lore.kernel.org/linux-iommu/20220706025524.2904370-1-baolu.lu@linux.intel.com/
   https://lore.kernel.org/linux-iommu/20220702015610.2849494-1-baolu.lu@linux.intel.com/

 - Rework s390 vfio drivers
   https://lore.kernel.org/kvm/20220707135737.720765-1-farman@linux.ibm.com/

 - Normalize vfio ioctl handling
   https://lore.kernel.org/kvm/0-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com/

 - VFIO API for dirty tracking (aka dma logging) managed inside a PCI
   device, with mlx5 implementation
   https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@nvidia.com

 - Introduce a struct device sysfs presence for struct vfio_device
   https://lore.kernel.org/kvm/20220901143747.32858-1-kevin.tian@intel.com/

 - Complete restructuring the vfio mdev model
   https://lore.kernel.org/kvm/20220822062208.152745-1-hch@lst.de/

 - Isolate VFIO container code in preperation for iommufd to provide an
   alternative implementation of it all
   https://lore.kernel.org/kvm/0-v1-a805b607f1fb+17b-vfio_container_split_jgg@nvidia.com

 - Simplify and consolidate iommu_domain/device compatability checking
   https://lore.kernel.org/linux-iommu/cover.1666042872.git.nicolinc@nvidia.com/

 - Align iommu SVA support with the domain-centric model
   https://lore.kernel.org/all/20221031005917.45690-1-baolu.lu@linux.intel.com/

This is about 233 patches applied since March, thank you to everyone
involved in all this work!

Currently there are a number of supporting series still in progress:

 - DMABUF exporter support for VFIO to allow PCI P2P with VFIO
   https://lore.kernel.org/r/0-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com

 - Start to provide iommu_domain ops for POWER
   https://lore.kernel.org/all/20220714081822.3717693-1-aik@ozlabs.ru/

However, these are not necessary for this series to advance.

Syzkaller coverage has been merged and is now running in the syzbot
environment on linux-next:

https://github.com/google/syzkaller/pull/3515
https://github.com/google/syzkaller/pull/3521
==================

Link: https://lore.kernel.org/r/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comSigned-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
parents 69e61ede 57f09887
......@@ -440,8 +440,11 @@ ForEachMacros:
- 'inet_lhash2_for_each_icsk'
- 'inet_lhash2_for_each_icsk_continue'
- 'inet_lhash2_for_each_icsk_rcu'
- 'interval_tree_for_each_double_span'
- 'interval_tree_for_each_span'
- 'intlist__for_each_entry'
- 'intlist__for_each_entry_safe'
- 'iopt_for_each_contig_area'
- 'kcore_copy__for_each_phdr'
- 'key_for_each'
- 'key_for_each_safe'
......
......@@ -25,6 +25,7 @@ place where this information is gathered.
ebpf/index
ioctl/index
iommu
iommufd
media/index
netlink/index
sysfs-platform_profile
......
......@@ -105,6 +105,7 @@ Code Seq# Include File Comments
'8' all SNP8023 advanced NIC card
<mailto:mcr@solidum.com>
';' 64-7F linux/vfio.h
';' 80-FF linux/iommufd.h
'=' 00-3f uapi/linux/ptp_clock.h <mailto:richardcochran@gmail.com>
'@' 00-0F linux/radeonfb.h conflict!
'@' 00-0F drivers/video/aty/aty128fb.c conflict!
......
.. SPDX-License-Identifier: GPL-2.0+
=======
IOMMUFD
=======
:Author: Jason Gunthorpe
:Author: Kevin Tian
Overview
========
IOMMUFD is the user API to control the IOMMU subsystem as it relates to managing
IO page tables from userspace using file descriptors. It intends to be general
and consumable by any driver that wants to expose DMA to userspace. These
drivers are eventually expected to deprecate any internal IOMMU logic
they may already/historically implement (e.g. vfio_iommu_type1.c).
At minimum iommufd provides universal support of managing I/O address spaces and
I/O page tables for all IOMMUs, with room in the design to add non-generic
features to cater to specific hardware functionality.
In this context the capital letter (IOMMUFD) refers to the subsystem while the
small letter (iommufd) refers to the file descriptors created via /dev/iommu for
use by userspace.
Key Concepts
============
User Visible Objects
--------------------
Following IOMMUFD objects are exposed to userspace:
- IOMMUFD_OBJ_IOAS, representing an I/O address space (IOAS), allowing map/unmap
of user space memory into ranges of I/O Virtual Address (IOVA).
The IOAS is a functional replacement for the VFIO container, and like the VFIO
container it copies an IOVA map to a list of iommu_domains held within it.
- IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an
external driver.
- IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table
(i.e. a single struct iommu_domain) managed by the iommu driver.
The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and
it will synchronize its mapping with each member HW_PAGETABLE.
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagram below shows relationship between user-visible objects and kernel
datastructures (external to iommufd), with numbers referred to operations
creating the objects and links::
_________________________________________________________
| iommufd |
| [1] |
| _________________ |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | [3] [2] |
| | | ____________ __________ |
| | IOAS |<--| |<------| | |
| | | |HW_PAGETABLE| | DEVICE | |
| | | |____________| |__________| |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| |_________________| | | |
| | | | |
|_________|___________________|___________________|_______|
| | |
| _____v______ _______v_____
| PFN storage | | | |
|------------>|iommu_domain| |struct device|
|____________| |_____________|
1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can
hold multiple IOAS objects. IOAS is the most generic object and does not
expose interfaces that are specific to single IOMMU drivers. All operations
on the IOAS must operate equally on each of the iommu_domains inside of it.
2. IOMMUFD_OBJ_DEVICE is created when an external driver calls the IOMMUFD kAPI
to bind a device to an iommufd. The driver is expected to implement a set of
ioctls to allow userspace to initiate the binding operation. Successful
completion of this operation establishes the desired DMA ownership over the
device. The driver must also set the driver_managed_dma flag and must not
touch the device until this operation succeeds.
3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD
kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI
allows userspace to initiate the attaching operation. If a compatible
pagetable already exists then it is reused for the attachment. Otherwise a
new pagetable object and iommu_domain is created. Successful completion of
this operation sets up the linkages among IOAS, device and iommu_domain. Once
this completes the device could do DMA.
Every iommu_domain inside the IOAS is also represented to userspace as a
HW_PAGETABLE object.
.. note::
Future IOMMUFD updates will provide an API to create and manipulate the
HW_PAGETABLE directly.
A device can only bind to an iommufd due to DMA ownership claim and attach to at
most one IOAS object (no support of PASID yet).
Kernel Datastructure
--------------------
User visible objects are backed by following datastructures:
- iommufd_ioas for IOMMUFD_OBJ_IOAS.
- iommufd_device for IOMMUFD_OBJ_DEVICE.
- iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE.
Several terminologies when looking at these datastructures:
- Automatic domain - refers to an iommu domain created automatically when
attaching a device to an IOAS object. This is compatible to the semantics of
VFIO type1.
- Manual domain - refers to an iommu domain designated by the user as the
target pagetable to be attached to by a device. Though currently there are
no uAPIs to directly create such domain, the datastructure and algorithms
are ready for handling that use case.
- In-kernel user - refers to something like a VFIO mdev that is using the
IOMMUFD access interface to access the IOAS. This starts by creating an
iommufd_access object that is similar to the domain binding a physical device
would do. The access object will then allow converting IOVA ranges into struct
page * lists, or doing direct read/write to an IOVA.
iommufd_ioas serves as the metadata datastructure to manage how IOVA ranges are
mapped to memory pages, composed of:
- struct io_pagetable holding the IOVA map
- struct iopt_area's representing populated portions of IOVA
- struct iopt_pages representing the storage of PFNs
- struct iommu_domain representing the IO page table in the IOMMU
- struct iopt_pages_access representing in-kernel users of PFNs
- struct xarray pinned_pfns holding a list of pages pinned by in-kernel users
Each iopt_pages represents a logical linear array of full PFNs. The PFNs are
ultimately derived from userspace VAs via an mm_struct. Once they have been
pinned the PFNs are stored in IOPTEs of an iommu_domain or inside the pinned_pfns
xarray if they have been pinned through an iommufd_access.
PFN have to be copied between all combinations of storage locations, depending
on what domains are present and what kinds of in-kernel "software access" users
exist. The mechanism ensures that a page is pinned only once.
An io_pagetable is composed of iopt_areas pointing at iopt_pages, along with a
list of iommu_domains that mirror the IOVA to PFN map.
Multiple io_pagetable-s, through their iopt_area-s, can share a single
iopt_pages which avoids multi-pinning and double accounting of page
consumption.
iommufd_ioas is sharable between subsystems, e.g. VFIO and VDPA, as long as
devices managed by different subsystems are bound to a same iommufd.
IOMMUFD User API
================
.. kernel-doc:: include/uapi/linux/iommufd.h
IOMMUFD Kernel API
==================
The IOMMUFD kAPI is device-centric with group-related tricks managed behind the
scene. This allows the external drivers calling such kAPI to implement a simple
device-centric uAPI for connecting its device to an iommufd, instead of
explicitly imposing the group semantics in its uAPI as VFIO does.
.. kernel-doc:: drivers/iommu/iommufd/device.c
:export:
.. kernel-doc:: drivers/iommu/iommufd/main.c
:export:
VFIO and IOMMUFD
----------------
Connecting a VFIO device to iommufd can be done in two ways.
First is a VFIO compatible way by directly implementing the /dev/vfio/vfio
container IOCTLs by mapping them into io_pagetable operations. Doing so allows
the use of iommufd in legacy VFIO applications by symlinking /dev/vfio/vfio to
/dev/iommufd or extending VFIO to SET_CONTAINER using an iommufd instead of a
container fd.
The second approach directly extends VFIO to support a new set of device-centric
user API based on aforementioned IOMMUFD kernel API. It requires userspace
change but better matches the IOMMUFD API semantics and easier to support new
iommufd features when comparing it to the first approach.
Currently both approaches are still work-in-progress.
There are still a few gaps to be resolved to catch up with VFIO type1, as
documented in iommufd_vfio_check_extension().
Future TODOs
============
Currently IOMMUFD supports only kernel-managed I/O page table, similar to VFIO
type1. New features on the radar include:
- Binding iommu_domain's to PASID/SSID
- Userspace page tables, for ARM, x86 and S390
- Kernel bypass'd invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
......@@ -10717,6 +10717,18 @@ F: drivers/iommu/dma-iommu.h
F: drivers/iommu/iova.c
F: include/linux/iova.h
IOMMUFD
M: Jason Gunthorpe <jgg@nvidia.com>
M: Kevin Tian <kevin.tian@intel.com>
L: iommu@lists.linux.dev
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
F: Documentation/userspace-api/iommufd.rst
F: drivers/iommu/iommufd/
F: include/linux/iommufd.h
F: include/uapi/linux/iommufd.h
F: tools/testing/selftests/iommu/
IOMMU SUBSYSTEM
M: Joerg Roedel <joro@8bytes.org>
M: Will Deacon <will@kernel.org>
......
......@@ -188,6 +188,7 @@ config MSM_IOMMU
source "drivers/iommu/amd/Kconfig"
source "drivers/iommu/intel/Kconfig"
source "drivers/iommu/iommufd/Kconfig"
config IRQ_REMAP
bool "Support for Interrupt Remapping"
......
# SPDX-License-Identifier: GPL-2.0
obj-y += amd/ intel/ arm/
obj-y += amd/ intel/ arm/ iommufd/
obj-$(CONFIG_IOMMU_API) += iommu.o
obj-$(CONFIG_IOMMU_API) += iommu-traces.o
obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
......
......@@ -2278,6 +2278,8 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
return false;
case IOMMU_CAP_PRE_BOOT_PROTECTION:
return amdr_ivrs_remap_support;
case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
return true;
default:
break;
}
......
......@@ -4450,14 +4450,20 @@ static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
static bool intel_iommu_capable(struct device *dev, enum iommu_cap cap)
{
if (cap == IOMMU_CAP_CACHE_COHERENCY)
struct device_domain_info *info = dev_iommu_priv_get(dev);
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
if (cap == IOMMU_CAP_INTR_REMAP)
case IOMMU_CAP_INTR_REMAP:
return irq_remapping_enabled == 1;
if (cap == IOMMU_CAP_PRE_BOOT_PROTECTION)
case IOMMU_CAP_PRE_BOOT_PROTECTION:
return dmar_platform_optin();
return false;
case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
return ecap_sc_support(info->iommu->ecap);
default:
return false;
}
}
static struct iommu_device *intel_iommu_probe_device(struct device *dev)
......
......@@ -3108,41 +3108,49 @@ static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
return 0;
}
static int __iommu_take_dma_ownership(struct iommu_group *group, void *owner)
{
int ret;
if ((group->domain && group->domain != group->default_domain) ||
!xa_empty(&group->pasid_array))
return -EBUSY;
ret = __iommu_group_alloc_blocking_domain(group);
if (ret)
return ret;
ret = __iommu_group_set_domain(group, group->blocking_domain);
if (ret)
return ret;
group->owner = owner;
group->owner_cnt++;
return 0;
}
/**
* iommu_group_claim_dma_owner() - Set DMA ownership of a group
* @group: The group.
* @owner: Caller specified pointer. Used for exclusive ownership.
*
* This is to support backward compatibility for vfio which manages
* the dma ownership in iommu_group level. New invocations on this
* interface should be prohibited.
* This is to support backward compatibility for vfio which manages the dma
* ownership in iommu_group level. New invocations on this interface should be
* prohibited. Only a single owner may exist for a group.
*/
int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
{
int ret = 0;
if (WARN_ON(!owner))
return -EINVAL;
mutex_lock(&group->mutex);
if (group->owner_cnt) {
ret = -EPERM;
goto unlock_out;
} else {
if ((group->domain && group->domain != group->default_domain) ||
!xa_empty(&group->pasid_array)) {
ret = -EBUSY;
goto unlock_out;
}
ret = __iommu_group_alloc_blocking_domain(group);
if (ret)
goto unlock_out;
ret = __iommu_group_set_domain(group, group->blocking_domain);
if (ret)
goto unlock_out;
group->owner = owner;
}
group->owner_cnt++;
ret = __iommu_take_dma_ownership(group, owner);
unlock_out:
mutex_unlock(&group->mutex);
......@@ -3151,30 +3159,91 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
/**
* iommu_group_release_dma_owner() - Release DMA ownership of a group
* @group: The group.
* iommu_device_claim_dma_owner() - Set DMA ownership of a device
* @dev: The device.
* @owner: Caller specified pointer. Used for exclusive ownership.
*
* Release the DMA ownership claimed by iommu_group_claim_dma_owner().
* Claim the DMA ownership of a device. Multiple devices in the same group may
* concurrently claim ownership if they present the same owner value. Returns 0
* on success and error code on failure
*/
void iommu_group_release_dma_owner(struct iommu_group *group)
int iommu_device_claim_dma_owner(struct device *dev, void *owner)
{
int ret;
struct iommu_group *group = iommu_group_get(dev);
int ret = 0;
if (!group)
return -ENODEV;
if (WARN_ON(!owner))
return -EINVAL;
mutex_lock(&group->mutex);
if (group->owner_cnt) {
if (group->owner != owner) {
ret = -EPERM;
goto unlock_out;
}
group->owner_cnt++;
goto unlock_out;
}
ret = __iommu_take_dma_ownership(group, owner);
unlock_out:
mutex_unlock(&group->mutex);
iommu_group_put(group);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner);
static void __iommu_release_dma_ownership(struct iommu_group *group)
{
int ret;
if (WARN_ON(!group->owner_cnt || !group->owner ||
!xa_empty(&group->pasid_array)))
goto unlock_out;
return;
group->owner_cnt = 0;
group->owner = NULL;
ret = __iommu_group_set_domain(group, group->default_domain);
WARN(ret, "iommu driver failed to attach the default domain");
}
unlock_out:
/**
* iommu_group_release_dma_owner() - Release DMA ownership of a group
* @dev: The device
*
* Release the DMA ownership claimed by iommu_group_claim_dma_owner().
*/
void iommu_group_release_dma_owner(struct iommu_group *group)
{
mutex_lock(&group->mutex);
__iommu_release_dma_ownership(group);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
/**
* iommu_device_release_dma_owner() - Release DMA ownership of a device
* @group: The device.
*
* Release the DMA ownership claimed by iommu_device_claim_dma_owner().
*/
void iommu_device_release_dma_owner(struct device *dev)
{
struct iommu_group *group = iommu_group_get(dev);
mutex_lock(&group->mutex);
if (group->owner_cnt > 1)
group->owner_cnt--;
else
__iommu_release_dma_ownership(group);
mutex_unlock(&group->mutex);
iommu_group_put(group);
}
EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
/**
* iommu_group_dma_owner_claimed() - Query group dma ownership status
* @group: The group.
......
# SPDX-License-Identifier: GPL-2.0-only
config IOMMUFD
tristate "IOMMU Userspace API"
select INTERVAL_TREE
select INTERVAL_TREE_SPAN_ITER
select IOMMU_API
default n
help
Provides /dev/iommu, the user API to control the IOMMU subsystem as
it relates to managing IO page tables that point at user space memory.
If you don't know what to do here, say N.
if IOMMUFD
config IOMMUFD_TEST
bool "IOMMU Userspace API Test support"
depends on DEBUG_KERNEL
depends on FAULT_INJECTION
depends on RUNTIME_TESTING_MENU
default n
help
This is dangerous, do not enable unless running
tools/testing/selftests/iommu
endif
# SPDX-License-Identifier: GPL-2.0-only
iommufd-y := \
device.o \
hw_pagetable.o \
io_pagetable.o \
ioas.o \
main.o \
pages.o \
vfio_compat.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
*/
#ifndef __IOMMUFD_DOUBLE_SPAN_H
#define __IOMMUFD_DOUBLE_SPAN_H
#include <linux/interval_tree.h>
/*
* This is a variation of the general interval_tree_span_iter that computes the
* spans over the union of two different interval trees. Used ranges are broken
* up and reported based on the tree that provides the interval. The first span
* always takes priority. Like interval_tree_span_iter it is greedy and the same
* value of is_used will not repeat on two iteration cycles.
*/
struct interval_tree_double_span_iter {
struct rb_root_cached *itrees[2];
struct interval_tree_span_iter spans[2];
union {
unsigned long start_hole;
unsigned long start_used;
};
union {
unsigned long last_hole;
unsigned long last_used;
};
/* 0 = hole, 1 = used span[0], 2 = used span[1], -1 done iteration */
int is_used;
};
void interval_tree_double_span_iter_update(
struct interval_tree_double_span_iter *iter);
void interval_tree_double_span_iter_first(
struct interval_tree_double_span_iter *iter,
struct rb_root_cached *itree1, struct rb_root_cached *itree2,
unsigned long first_index, unsigned long last_index);
void interval_tree_double_span_iter_next(
struct interval_tree_double_span_iter *iter);
static inline bool
interval_tree_double_span_iter_done(struct interval_tree_double_span_iter *state)
{
return state->is_used == -1;
}
#define interval_tree_for_each_double_span(span, itree1, itree2, first_index, \
last_index) \
for (interval_tree_double_span_iter_first(span, itree1, itree2, \
first_index, last_index); \
!interval_tree_double_span_iter_done(span); \
interval_tree_double_span_iter_next(span))
#endif
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
*/
#include <linux/iommu.h>
#include "iommufd_private.h"
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
{
struct iommufd_hw_pagetable *hwpt =
container_of(obj, struct iommufd_hw_pagetable, obj);
WARN_ON(!list_empty(&hwpt->devices));
iommu_domain_free(hwpt->domain);
refcount_dec(&hwpt->ioas->obj.users);
mutex_destroy(&hwpt->devices_lock);
}
/**
* iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
* @ictx: iommufd context
* @ioas: IOAS to associate the domain with
* @dev: Device to get an iommu_domain for
*
* Allocate a new iommu_domain and return it as a hw_pagetable.
*/
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
struct device *dev)
{
struct iommufd_hw_pagetable *hwpt;
int rc;
hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
if (IS_ERR(hwpt))
return hwpt;
hwpt->domain = iommu_domain_alloc(dev->bus);
if (!hwpt->domain) {
rc = -ENOMEM;
goto out_abort;
}
INIT_LIST_HEAD(&hwpt->devices);
INIT_LIST_HEAD(&hwpt->hwpt_item);
mutex_init(&hwpt->devices_lock);
/* Pairs with iommufd_hw_pagetable_destroy() */
refcount_inc(&ioas->obj.users);
hwpt->ioas = ioas;
return hwpt;
out_abort:
iommufd_object_abort(ictx, &hwpt->obj);
return ERR_PTR(rc);
}
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES.
*
*/
#ifndef __IO_PAGETABLE_H
#define __IO_PAGETABLE_H
#include <linux/interval_tree.h>
#include <linux/mutex.h>
#include <linux/kref.h>
#include <linux/xarray.h>
#include "iommufd_private.h"
struct iommu_domain;
/*
* Each io_pagetable is composed of intervals of areas which cover regions of
* the iova that are backed by something. iova not covered by areas is not
* populated in the page table. Each area is fully populated with pages.
*
* iovas are in byte units, but must be iopt->iova_alignment aligned.
*
* pages can be NULL, this means some other thread is still working on setting
* up or tearing down the area. When observed under the write side of the
* domain_rwsem a NULL pages must mean the area is still being setup and no
* domains are filled.
*
* storage_domain points at an arbitrary iommu_domain that is holding the PFNs
* for this area. It is locked by the pages->mutex. This simplifies the locking
* as the pages code can rely on the storage_domain without having to get the
* iopt->domains_rwsem.
*
* The io_pagetable::iova_rwsem protects node
* The iopt_pages::mutex protects pages_node
* iopt and immu_prot are immutable
* The pages::mutex protects num_accesses
*/
struct iopt_area {
struct interval_tree_node node;
struct interval_tree_node pages_node;
struct io_pagetable *iopt;
struct iopt_pages *pages;
struct iommu_domain *storage_domain;
/* How many bytes into the first page the area starts */
unsigned int page_offset;
/* IOMMU_READ, IOMMU_WRITE, etc */
int iommu_prot;
bool prevent_access : 1;
unsigned int num_accesses;
};
struct iopt_allowed {
struct interval_tree_node node;
};
struct iopt_reserved {
struct interval_tree_node node;
void *owner;
};
int iopt_area_fill_domains(struct iopt_area *area, struct iopt_pages *pages);
void iopt_area_unfill_domains(struct iopt_area *area, struct iopt_pages *pages);
int iopt_area_fill_domain(struct iopt_area *area, struct iommu_domain *domain);
void iopt_area_unfill_domain(struct iopt_area *area, struct iopt_pages *pages,
struct iommu_domain *domain);
void iopt_area_unmap_domain(struct iopt_area *area,
struct iommu_domain *domain);
static inline unsigned long iopt_area_index(struct iopt_area *area)
{
return area->pages_node.start;
}
static inline unsigned long iopt_area_last_index(struct iopt_area *area)
{
return area->pages_node.last;
}
static inline unsigned long iopt_area_iova(struct iopt_area *area)
{
return area->node.start;
}
static inline unsigned long iopt_area_last_iova(struct iopt_area *area)
{
return area->node.last;
}
static inline size_t iopt_area_length(struct iopt_area *area)
{
return (area->node.last - area->node.start) + 1;
}
/*
* Number of bytes from the start of the iopt_pages that the iova begins.
* iopt_area_start_byte() / PAGE_SIZE encodes the starting page index
* iopt_area_start_byte() % PAGE_SIZE encodes the offset within that page
*/
static inline unsigned long iopt_area_start_byte(struct iopt_area *area,
unsigned long iova)
{
if (IS_ENABLED(CONFIG_IOMMUFD_TEST))
WARN_ON(iova < iopt_area_iova(area) ||
iova > iopt_area_last_iova(area));
return (iova - iopt_area_iova(area)) + area->page_offset +
iopt_area_index(area) * PAGE_SIZE;
}
static inline unsigned long iopt_area_iova_to_index(struct iopt_area *area,
unsigned long iova)
{
return iopt_area_start_byte(area, iova) / PAGE_SIZE;
}
#define __make_iopt_iter(name) \
static inline struct iopt_##name *iopt_##name##_iter_first( \
struct io_pagetable *iopt, unsigned long start, \
unsigned long last) \
{ \
struct interval_tree_node *node; \
\
lockdep_assert_held(&iopt->iova_rwsem); \
node = interval_tree_iter_first(&iopt->name##_itree, start, \
last); \
if (!node) \
return NULL; \
return container_of(node, struct iopt_##name, node); \
} \
static inline struct iopt_##name *iopt_##name##_iter_next( \
struct iopt_##name *last_node, unsigned long start, \
unsigned long last) \
{ \
struct interval_tree_node *node; \
\
node = interval_tree_iter_next(&last_node->node, start, last); \
if (!node) \
return NULL; \
return container_of(node, struct iopt_##name, node); \
}
__make_iopt_iter(area)
__make_iopt_iter(allowed)
__make_iopt_iter(reserved)
struct iopt_area_contig_iter {
unsigned long cur_iova;
unsigned long last_iova;
struct iopt_area *area;
};
struct iopt_area *iopt_area_contig_init(struct iopt_area_contig_iter *iter,
struct io_pagetable *iopt,
unsigned long iova,
unsigned long last_iova);
struct iopt_area *iopt_area_contig_next(struct iopt_area_contig_iter *iter);
static inline bool iopt_area_contig_done(struct iopt_area_contig_iter *iter)
{
return iter->area && iter->last_iova <= iopt_area_last_iova(iter->area);
}
/*
* Iterate over a contiguous list of areas that span the iova,last_iova range.
* The caller must check iopt_area_contig_done() after the loop to see if
* contiguous areas existed.
*/
#define iopt_for_each_contig_area(iter, area, iopt, iova, last_iova) \
for (area = iopt_area_contig_init(iter, iopt, iova, last_iova); area; \
area = iopt_area_contig_next(iter))
enum {
IOPT_PAGES_ACCOUNT_NONE = 0,
IOPT_PAGES_ACCOUNT_USER = 1,
IOPT_PAGES_ACCOUNT_MM = 2,
};
/*
* This holds a pinned page list for multiple areas of IO address space. The
* pages always originate from a linear chunk of userspace VA. Multiple
* io_pagetable's, through their iopt_area's, can share a single iopt_pages
* which avoids multi-pinning and double accounting of page consumption.
*
* indexes in this structure are measured in PAGE_SIZE units, are 0 based from
* the start of the uptr and extend to npages. pages are pinned dynamically
* according to the intervals in the access_itree and domains_itree, npinned
* records the current number of pages pinned.
*/
struct iopt_pages {
struct kref kref;
struct mutex mutex;
size_t npages;
size_t npinned;
size_t last_npinned;
struct task_struct *source_task;
struct mm_struct *source_mm;
struct user_struct *source_user;
void __user *uptr;
bool writable:1;
u8 account_mode;
struct xarray pinned_pfns;
/* Of iopt_pages_access::node */
struct rb_root_cached access_itree;
/* Of iopt_area::pages_node */
struct rb_root_cached domains_itree;
};
struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
bool writable);
void iopt_release_pages(struct kref *kref);
static inline void iopt_put_pages(struct iopt_pages *pages)
{
kref_put(&pages->kref, iopt_release_pages);
}
void iopt_pages_fill_from_xarray(struct iopt_pages *pages, unsigned long start,
unsigned long last, struct page **out_pages);
int iopt_pages_fill_xarray(struct iopt_pages *pages, unsigned long start,
unsigned long last, struct page **out_pages);
void iopt_pages_unfill_xarray(struct iopt_pages *pages, unsigned long start,
unsigned long last);
int iopt_area_add_access(struct iopt_area *area, unsigned long start,
unsigned long last, struct page **out_pages,
unsigned int flags);
void iopt_area_remove_access(struct iopt_area *area, unsigned long start,
unsigned long last);
int iopt_pages_rw_access(struct iopt_pages *pages, unsigned long start_byte,
void *data, unsigned long length, unsigned int flags);
/*
* Each interval represents an active iopt_access_pages(), it acts as an
* interval lock that keeps the PFNs pinned and stored in the xarray.
*/
struct iopt_pages_access {
struct interval_tree_node node;
unsigned int users;
};
#endif
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
*/
#include <linux/interval_tree.h>
#include <linux/iommufd.h>
#include <linux/iommu.h>
#include <uapi/linux/iommufd.h>
#include "io_pagetable.h"
void iommufd_ioas_destroy(struct iommufd_object *obj)
{
struct iommufd_ioas *ioas = container_of(obj, struct iommufd_ioas, obj);
int rc;
rc = iopt_unmap_all(&ioas->iopt, NULL);
WARN_ON(rc && rc != -ENOENT);
iopt_destroy_table(&ioas->iopt);
mutex_destroy(&ioas->mutex);
}
struct iommufd_ioas *iommufd_ioas_alloc(struct iommufd_ctx *ictx)
{
struct iommufd_ioas *ioas;
ioas = iommufd_object_alloc(ictx, ioas, IOMMUFD_OBJ_IOAS);
if (IS_ERR(ioas))
return ioas;
iopt_init_table(&ioas->iopt);
INIT_LIST_HEAD(&ioas->hwpt_list);
mutex_init(&ioas->mutex);
return ioas;
}
int iommufd_ioas_alloc_ioctl(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_alloc *cmd = ucmd->cmd;
struct iommufd_ioas *ioas;
int rc;
if (cmd->flags)
return -EOPNOTSUPP;
ioas = iommufd_ioas_alloc(ucmd->ictx);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
cmd->out_ioas_id = ioas->obj.id;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
goto out_table;
iommufd_object_finalize(ucmd->ictx, &ioas->obj);
return 0;
out_table:
iommufd_object_abort_and_destroy(ucmd->ictx, &ioas->obj);
return rc;
}
int iommufd_ioas_iova_ranges(struct iommufd_ucmd *ucmd)
{
struct iommu_iova_range __user *ranges;
struct iommu_ioas_iova_ranges *cmd = ucmd->cmd;
struct iommufd_ioas *ioas;
struct interval_tree_span_iter span;
u32 max_iovas;
int rc;
if (cmd->__reserved)
return -EOPNOTSUPP;
ioas = iommufd_get_ioas(ucmd, cmd->ioas_id);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
down_read(&ioas->iopt.iova_rwsem);
max_iovas = cmd->num_iovas;
ranges = u64_to_user_ptr(cmd->allowed_iovas);
cmd->num_iovas = 0;
cmd->out_iova_alignment = ioas->iopt.iova_alignment;
interval_tree_for_each_span(&span, &ioas->iopt.reserved_itree, 0,
ULONG_MAX) {
if (!span.is_hole)
continue;
if (cmd->num_iovas < max_iovas) {
struct iommu_iova_range elm = {
.start = span.start_hole,
.last = span.last_hole,
};
if (copy_to_user(&ranges[cmd->num_iovas], &elm,
sizeof(elm))) {
rc = -EFAULT;
goto out_put;
}
}
cmd->num_iovas++;
}
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
goto out_put;
if (cmd->num_iovas > max_iovas)
rc = -EMSGSIZE;
out_put:
up_read(&ioas->iopt.iova_rwsem);
iommufd_put_object(&ioas->obj);
return rc;
}
static int iommufd_ioas_load_iovas(struct rb_root_cached *itree,
struct iommu_iova_range __user *ranges,
u32 num)
{
u32 i;
for (i = 0; i != num; i++) {
struct iommu_iova_range range;
struct iopt_allowed *allowed;
if (copy_from_user(&range, ranges + i, sizeof(range)))
return -EFAULT;
if (range.start >= range.last)
return -EINVAL;
if (interval_tree_iter_first(itree, range.start, range.last))
return -EINVAL;
allowed = kzalloc(sizeof(*allowed), GFP_KERNEL_ACCOUNT);
if (!allowed)
return -ENOMEM;
allowed->node.start = range.start;
allowed->node.last = range.last;
interval_tree_insert(&allowed->node, itree);
}
return 0;
}
int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_allow_iovas *cmd = ucmd->cmd;
struct rb_root_cached allowed_iova = RB_ROOT_CACHED;
struct interval_tree_node *node;
struct iommufd_ioas *ioas;
struct io_pagetable *iopt;
int rc = 0;
if (cmd->__reserved)
return -EOPNOTSUPP;
ioas = iommufd_get_ioas(ucmd, cmd->ioas_id);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
iopt = &ioas->iopt;
rc = iommufd_ioas_load_iovas(&allowed_iova,
u64_to_user_ptr(cmd->allowed_iovas),
cmd->num_iovas);
if (rc)
goto out_free;
/*
* We want the allowed tree update to be atomic, so we have to keep the
* original nodes around, and keep track of the new nodes as we allocate
* memory for them. The simplest solution is to have a new/old tree and
* then swap new for old. On success we free the old tree, on failure we
* free the new tree.
*/
rc = iopt_set_allow_iova(iopt, &allowed_iova);
out_free:
while ((node = interval_tree_iter_first(&allowed_iova, 0, ULONG_MAX))) {
interval_tree_remove(node, &allowed_iova);
kfree(container_of(node, struct iopt_allowed, node));
}
iommufd_put_object(&ioas->obj);
return rc;
}
static int conv_iommu_prot(u32 map_flags)
{
/*
* We provide no manual cache coherency ioctls to userspace and most
* architectures make the CPU ops for cache flushing privileged.
* Therefore we require the underlying IOMMU to support CPU coherent
* operation. Support for IOMMU_CACHE is enforced by the
* IOMMU_CAP_CACHE_COHERENCY test during bind.
*/
int iommu_prot = IOMMU_CACHE;
if (map_flags & IOMMU_IOAS_MAP_WRITEABLE)
iommu_prot |= IOMMU_WRITE;
if (map_flags & IOMMU_IOAS_MAP_READABLE)
iommu_prot |= IOMMU_READ;
return iommu_prot;
}
int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_map *cmd = ucmd->cmd;
unsigned long iova = cmd->iova;
struct iommufd_ioas *ioas;
unsigned int flags = 0;
int rc;
if ((cmd->flags &
~(IOMMU_IOAS_MAP_FIXED_IOVA | IOMMU_IOAS_MAP_WRITEABLE |
IOMMU_IOAS_MAP_READABLE)) ||
cmd->__reserved)
return -EOPNOTSUPP;
if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
return -EOVERFLOW;
ioas = iommufd_get_ioas(ucmd, cmd->ioas_id);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
flags = IOPT_ALLOC_IOVA;
rc = iopt_map_user_pages(ucmd->ictx, &ioas->iopt, &iova,
u64_to_user_ptr(cmd->user_va), cmd->length,
conv_iommu_prot(cmd->flags), flags);
if (rc)
goto out_put;
cmd->iova = iova;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out_put:
iommufd_put_object(&ioas->obj);
return rc;
}
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_copy *cmd = ucmd->cmd;
struct iommufd_ioas *src_ioas;
struct iommufd_ioas *dst_ioas;
unsigned int flags = 0;
LIST_HEAD(pages_list);
unsigned long iova;
int rc;
iommufd_test_syz_conv_iova_id(ucmd, cmd->src_ioas_id, &cmd->src_iova,
&cmd->flags);
if ((cmd->flags &
~(IOMMU_IOAS_MAP_FIXED_IOVA | IOMMU_IOAS_MAP_WRITEABLE |
IOMMU_IOAS_MAP_READABLE)))
return -EOPNOTSUPP;
if (cmd->length >= ULONG_MAX || cmd->src_iova >= ULONG_MAX ||
cmd->dst_iova >= ULONG_MAX)
return -EOVERFLOW;
src_ioas = iommufd_get_ioas(ucmd, cmd->src_ioas_id);
if (IS_ERR(src_ioas))
return PTR_ERR(src_ioas);
rc = iopt_get_pages(&src_ioas->iopt, cmd->src_iova, cmd->length,
&pages_list);
iommufd_put_object(&src_ioas->obj);
if (rc)
return rc;
dst_ioas = iommufd_get_ioas(ucmd, cmd->dst_ioas_id);
if (IS_ERR(dst_ioas)) {
rc = PTR_ERR(dst_ioas);
goto out_pages;
}
if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
flags = IOPT_ALLOC_IOVA;
iova = cmd->dst_iova;
rc = iopt_map_pages(&dst_ioas->iopt, &pages_list, cmd->length, &iova,
conv_iommu_prot(cmd->flags), flags);
if (rc)
goto out_put_dst;
cmd->dst_iova = iova;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out_put_dst:
iommufd_put_object(&dst_ioas->obj);
out_pages:
iopt_free_pages_list(&pages_list);
return rc;
}
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_unmap *cmd = ucmd->cmd;
struct iommufd_ioas *ioas;
unsigned long unmapped = 0;
int rc;
ioas = iommufd_get_ioas(ucmd, cmd->ioas_id);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
if (cmd->iova == 0 && cmd->length == U64_MAX) {
rc = iopt_unmap_all(&ioas->iopt, &unmapped);
if (rc)
goto out_put;
} else {
if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX) {
rc = -EOVERFLOW;
goto out_put;
}
rc = iopt_unmap_iova(&ioas->iopt, cmd->iova, cmd->length,
&unmapped);
if (rc)
goto out_put;
}
cmd->length = unmapped;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out_put:
iommufd_put_object(&ioas->obj);
return rc;
}
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx)
{
if (cmd->object_id)
return -EOPNOTSUPP;
if (cmd->op == IOMMU_OPTION_OP_GET) {
cmd->val64 = ictx->account_mode == IOPT_PAGES_ACCOUNT_MM;
return 0;
}
if (cmd->op == IOMMU_OPTION_OP_SET) {
int rc = 0;
if (!capable(CAP_SYS_RESOURCE))
return -EPERM;
xa_lock(&ictx->objects);
if (!xa_empty(&ictx->objects)) {
rc = -EBUSY;
} else {
if (cmd->val64 == 0)
ictx->account_mode = IOPT_PAGES_ACCOUNT_USER;
else if (cmd->val64 == 1)
ictx->account_mode = IOPT_PAGES_ACCOUNT_MM;
else
rc = -EINVAL;
}
xa_unlock(&ictx->objects);
return rc;
}
return -EOPNOTSUPP;
}
static int iommufd_ioas_option_huge_pages(struct iommu_option *cmd,
struct iommufd_ioas *ioas)
{
if (cmd->op == IOMMU_OPTION_OP_GET) {
cmd->val64 = !ioas->iopt.disable_large_pages;
return 0;
}
if (cmd->op == IOMMU_OPTION_OP_SET) {
if (cmd->val64 == 0)
return iopt_disable_large_pages(&ioas->iopt);
if (cmd->val64 == 1) {
iopt_enable_large_pages(&ioas->iopt);
return 0;
}
return -EINVAL;
}
return -EOPNOTSUPP;
}
int iommufd_ioas_option(struct iommufd_ucmd *ucmd)
{
struct iommu_option *cmd = ucmd->cmd;
struct iommufd_ioas *ioas;
int rc = 0;
if (cmd->__reserved)
return -EOPNOTSUPP;
ioas = iommufd_get_ioas(ucmd, cmd->object_id);
if (IS_ERR(ioas))
return PTR_ERR(ioas);
switch (cmd->option_id) {
case IOMMU_OPTION_HUGE_PAGES:
rc = iommufd_ioas_option_huge_pages(cmd, ioas);
break;
default:
rc = -EOPNOTSUPP;
}
iommufd_put_object(&ioas->obj);
return rc;
}
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES.
*/
#ifndef _UAPI_IOMMUFD_TEST_H
#define _UAPI_IOMMUFD_TEST_H
#include <linux/types.h>
#include <linux/iommufd.h>
enum {
IOMMU_TEST_OP_ADD_RESERVED = 1,
IOMMU_TEST_OP_MOCK_DOMAIN,
IOMMU_TEST_OP_MD_CHECK_MAP,
IOMMU_TEST_OP_MD_CHECK_REFS,
IOMMU_TEST_OP_CREATE_ACCESS,
IOMMU_TEST_OP_DESTROY_ACCESS_PAGES,
IOMMU_TEST_OP_ACCESS_PAGES,
IOMMU_TEST_OP_ACCESS_RW,
IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
};
enum {
MOCK_APERTURE_START = 1UL << 24,
MOCK_APERTURE_LAST = (1UL << 31) - 1,
};
enum {
MOCK_FLAGS_ACCESS_WRITE = 1 << 0,
MOCK_FLAGS_ACCESS_SYZ = 1 << 16,
};
enum {
MOCK_ACCESS_RW_WRITE = 1 << 0,
MOCK_ACCESS_RW_SLOW_PATH = 1 << 2,
};
enum {
MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0,
};
struct iommu_test_cmd {
__u32 size;
__u32 op;
__u32 id;
__u32 __reserved;
union {
struct {
__aligned_u64 start;
__aligned_u64 length;
} add_reserved;
struct {
__u32 out_device_id;
__u32 out_hwpt_id;
} mock_domain;
struct {
__aligned_u64 iova;
__aligned_u64 length;
__aligned_u64 uptr;
} check_map;
struct {
__aligned_u64 length;
__aligned_u64 uptr;
__u32 refs;
} check_refs;
struct {
__u32 out_access_fd;
__u32 flags;
} create_access;
struct {
__u32 access_pages_id;
} destroy_access_pages;
struct {
__u32 flags;
__u32 out_access_pages_id;
__aligned_u64 iova;
__aligned_u64 length;
__aligned_u64 uptr;
} access_pages;
struct {
__aligned_u64 iova;
__aligned_u64 length;
__aligned_u64 uptr;
__u32 flags;
} access_rw;
struct {
__u32 limit;
} memory_limit;
};
__u32 last;
};
#define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32)
#endif
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -27,4 +27,62 @@ extern struct interval_tree_node *
interval_tree_iter_next(struct interval_tree_node *node,
unsigned long start, unsigned long last);
/**
* struct interval_tree_span_iter - Find used and unused spans.
* @start_hole: Start of an interval for a hole when is_hole == 1
* @last_hole: Inclusive end of an interval for a hole when is_hole == 1
* @start_used: Start of a used interval when is_hole == 0
* @last_used: Inclusive end of a used interval when is_hole == 0
* @is_hole: 0 == used, 1 == is_hole, -1 == done iteration
*
* This iterator travels over spans in an interval tree. It does not return
* nodes but classifies each span as either a hole, where no nodes intersect, or
* a used, which is fully covered by nodes. Each iteration step toggles between
* hole and used until the entire range is covered. The returned spans always
* fully cover the requested range.
*
* The iterator is greedy, it always returns the largest hole or used possible,
* consolidating all consecutive nodes.
*
* Use interval_tree_span_iter_done() to detect end of iteration.
*/
struct interval_tree_span_iter {
/* private: not for use by the caller */
struct interval_tree_node *nodes[2];
unsigned long first_index;
unsigned long last_index;
/* public: */
union {
unsigned long start_hole;
unsigned long start_used;
};
union {
unsigned long last_hole;
unsigned long last_used;
};
int is_hole;
};
void interval_tree_span_iter_first(struct interval_tree_span_iter *state,
struct rb_root_cached *itree,
unsigned long first_index,
unsigned long last_index);
void interval_tree_span_iter_advance(struct interval_tree_span_iter *iter,
struct rb_root_cached *itree,
unsigned long new_index);
void interval_tree_span_iter_next(struct interval_tree_span_iter *state);
static inline bool
interval_tree_span_iter_done(struct interval_tree_span_iter *state)
{
return state->is_hole == -1;
}
#define interval_tree_for_each_span(span, itree, first_index, last_index) \
for (interval_tree_span_iter_first(span, itree, \
first_index, last_index); \
!interval_tree_span_iter_done(span); \
interval_tree_span_iter_next(span))
#endif /* _LINUX_INTERVAL_TREE_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -185,6 +185,7 @@ void free_uid(struct user_struct *up)
if (refcount_dec_and_lock_irqsave(&up->__count, &uidhash_lock, &flags))
free_user(up, flags);
}
EXPORT_SYMBOL_GPL(free_uid);
struct user_struct *alloc_uid(kuid_t uid)
{
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment