• Jason Gunthorpe's avatar
    iommufd: vfio container FD ioctl compatibility · d624d665
    Jason Gunthorpe authored
    iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
    mapping them into io_pagetable operations.
    
    A userspace application can test against iommufd and confirm compatibility
    then simply make a small change to open /dev/iommu instead of
    /dev/vfio/vfio.
    
    For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and
    then all applications will use the compatibility path with no code
    changes. A later series allows /dev/vfio/vfio to be directly provided by
    iommufd, which allows the rlimit mode to work the same as well.
    
    This series just provides the iommufd side of compatibility. Actually
    linking this to VFIO_SET_CONTAINER is a followup series, with a link in
    the cover letter.
    
    Internally the compatibility API uses a normal IOAS object that, like
    vfio, is automatically allocated when the first device is
    attached.
    
    Userspace can also query or set this IOAS object directly using the
    IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only
    features while still using the VFIO style map/unmap ioctls.
    
    While this is enough to operate qemu, it has a few differences:
    
     - Resource limits rely on memory cgroups to bound what userspace can do
       instead of the module parameter dma_entry_limit.
    
     - VFIO P2P is not implemented. The DMABUF patches for vfio are a start at
       a solution where iommufd would import a special DMABUF. This is to avoid
       further propogating the follow_pfn() security problem.
    
     - A full audit for pedantic compatibility details (eg errnos, etc) has
       not yet been done
    
     - powerpc SPAPR is left out, as it is not connected to the iommu_domain
       framework. It seems interest in SPAPR is minimal as it is currently
       non-working in v6.1-rc1. They will have to convert to the iommu
       subsystem framework to enjoy iommfd.
    
    The following are not going to be implemented and we expect to remove them
    from VFIO type1:
    
     - SW access 'dirty tracking'. As discussed in the cover letter this will
       be done in VFIO.
    
     - VFIO_TYPE1_NESTING_IOMMU
        https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
    
     - VFIO_DMA_MAP_FLAG_VADDR
        https://lore.kernel.org/all/Yz777bJZjTyLrHEQ@nvidia.com/
    
    Link: https://lore.kernel.org/r/15-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comTested-by: default avatarNicolin Chen <nicolinc@nvidia.com>
    Tested-by: default avatarYi Liu <yi.l.liu@intel.com>
    Tested-by: default avatarLixiao Yang <lixiao.yang@intel.com>
    Tested-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
    Reviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
    Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
    Signed-off-by: default avatarNicolin Chen <nicolinc@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    d624d665
iommufd_private.h 9.28 KB