An error occurred fetching the project authors.
- 09 Sep, 2024 5 commits
-
-
Jason Gunthorpe authored
The master->cd_table is entirely contained within the struct arm_smmu_master which is guaranteed to be freed by the core code under arm_smmu_release_device(). There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably called to free the CD related memory. Remove it and save some memory. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
These values can be computed from the other values already stored in the config. Move the calculation to arm_smmu_write_strtab() and do it directly before writing the registers. This moves all the logic to calculate the two registers into one function from three and saves an unimportant 16 bytes from the arm_smmu_device. Suggested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
The members here are being used for both the linear and the 2 level case, with the meaning of each item slightly different in the two cases. Split it into a clean union where both cases have their own struct with their own logical names and correct types. Adjust all the users to detect linear/2lvl and use the right sub structure and types consistently. Remove STRTAB_STE_DWORDS by changing the last places to use sizeof(struct arm_smmu_ste). Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Add types struct arm_smmu_strtab_l1 and l2 to represent the HW layout of the descriptors, and use them in most places, following patches will get the remaing places. The size of the l1 and l2 HW allocations are sizeof(struct arm_smmu_strtab_l1/2). This provides some more clarity than having raw __le64 *'s and sizes computed via macros. Remove STRTAB_L1_DESC_DWORDS. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Don't open code the calculations of the indexes for each level, provide two functions to do that math and call them in all the places. Update all the places computing indexes. Calculate the L1 table size directly based on the max required index from the cap. Remove STRTAB_L1_SZ_SHIFT in favour of STRTAB_NUM_L2_STES. Use STRTAB_NUM_L2_STES to replace remaining open coded 1 << STRTAB_SPLIT. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 06 Sep, 2024 1 commit
-
-
Jason Gunthorpe authored
Since v5.12 the rbtree has gained some simplifying helpers aimed at making rb tree users write less convoluted boiler plate code. Instead the caller provides a single comparison function and the helpers generate the prior open-coded stuff. Update smmu->streams to use rb_find_add() and rb_find(). Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Mostafa Saleh <smostafa@google.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-9fef8cdc2ff6+150d1-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 05 Sep, 2024 1 commit
-
-
Nicolin Chen authored
It's observed that, when the first 4GB of system memory was reserved, all VCMDQ allocations failed (even with the smallest qsz in the last attempt): arm-smmu-v3: found companion CMDQV device: NVDA200C:00 arm-smmu-v3: option mask 0x10 arm-smmu-v3: failed to allocate queue (0x8000 bytes) for vcmdq0 acpi NVDA200C:00: tegra241_cmdqv: Falling back to standard SMMU CMDQ arm-smmu-v3: ias 48-bit, oas 48-bit (features 0x001e1fbf) arm-smmu-v3: allocated 524288 entries for cmdq arm-smmu-v3: allocated 524288 entries for evtq arm-smmu-v3: allocated 524288 entries for priq This is because the 4GB reserved memory shifted the entire DMA zone from a lower 32-bit range (on a system without the 4GB carveout) to higher range, while the dev->coherent_dma_mask was set to DMA_BIT_MASK(32) by default. The dma_set_mask_and_coherent() call is done in arm_smmu_device_hw_probe() of the SMMU driver. So any DMA allocation from tegra241_cmdqv_probe() must wait until the coherent_dma_mask is correctly set. Move the vintf/vcmdq structure initialization routine into a different op, "init_structures". Call it at the end of arm_smmu_init_structures(), where standard SMMU queues get allocated. Most of the impl_ops aren't ready until vintf/vcmdq structure are init-ed. So replace the full impl_ops with an init_ops in __tegra241_cmdqv_probe(). And switch to tegra241_cmdqv_impl_ops later in arm_smmu_init_structures(). Note that tegra241_cmdqv_impl_ops does not link to the new init_structures op after this switch, since there is no point in having it once it's done. Fixes: 918eb5c8 ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Reported-by:
Matt Ochs <mochs@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/530993c3aafa1b0fc3d879b8119e13c629d12e2b.1725503154.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 30 Aug, 2024 11 commits
-
-
Mostafa Saleh authored
According to the spec (ARM IHI 0070 F.b), in "5.5 Fault configuration (A, R, S bits)": A STE with stage 2 translation enabled and STE.S2S == 0 is considered ILLEGAL if SMMU_IDR0.STALL_MODEL == 0b10. Also described in the pseudocode “SteIllegal()” if STE.Config == '11x' then [..] if eff_idr0_stall_model == '10' && STE.S2S == '0' then // stall_model forcing stall, but S2S == 0 return TRUE; Which means, S2S must be set when stall model is "ARM_SMMU_FEAT_STALL_FORCE", but currently the driver ignores that. Although, the driver can do the minimum and only set S2S for “ARM_SMMU_FEAT_STALL_FORCE”, it is more consistent to match S1 behaviour, which also sets it for “ARM_SMMU_FEAT_STALL” if the master has requested stalls. Also, since S2 stalls are enabled now, report them to the IOMMU layer and for VFIO devices it will fail anyway as VFIO doesn’t register an iopf handler. Signed-off-by:
Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20240830110349.797399-2-smostafa@google.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
When VCMDQs are assigned to a VINTF owned by a guest (HYP_OWN bit unset), only TLB and ATC invalidation commands are supported by the VCMDQ HW. So, implement the new cmdq->supports_cmd op to scan the input cmd in order to make sure that it is supported by the selected queue. Note that the guest VM shouldn't have HYP_OWN bit being set regardless of guest kernel driver writing it or not, i.e. the hypervisor running in the host OS should wire this bit to zero when trapping a write access to this VINTF_CONFIG register from a guest kernel. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/8160292337059b91271045800e5c62f7295e2c24.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
The VCMDQ in the tegra241-cmdqv driver has a guest mode that supports only a few invalidation commands. A batch is initialized with a cmdq, so it has to confirm whether a new command is supported or not. Add a supports_cmd function pointer to the cmdq structure, where the vcmdq driver should hook a command scan function. Add an inline helper too so it can be used by both sides. If a new command is not supported, simply issue the existing batch and re- init it as a new batch. Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/aafb24b881504f18c5d0c7c15f2134e40ad2c486.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nate Watterson authored
NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization capabilities. In terms of command queue, they are very like a standard SMMU CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC. Add a new tegra241-cmdqv driver, and insert its structure pointer into the existing arm_smmu_device, and then add related function calls in the SMMUv3 driver to interact with the CMDQV driver. In the CMDQV driver, add a minimal part for the in-kernel support: reserve VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and select one VCMDQ based on the current CPU ID to execute supported commands. This multi-queue design for in-kernel use gives some limited improvements: up to 20% reduction of invalidation time was measured by a multi-threaded DMA unmap benchmark, compared to a single queue. The other part of the CMDQV driver will be user-space support that gives a hypervisor running on the host OS to talk to the driver for virtualization use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits. This is designed based on IOMMUFD, and its RFC series is also under review. It will provide a guest OS a bigger improvement: 70% to 90% reductions of TLB invalidation time were measured by DMA unmap tests running in a guest, compared to nested SMMU CMDQ (with trappings). As the initial version, the CMDQV driver only supports ACPI configurations. Signed-off-by:
Nate Watterson <nwatterson@nvidia.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Co-developed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/dce50490b2c10b7254fb36aa73ed7ffd812b283a.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Mimicing the arm-smmu (v2) driver, introduce a struct arm_smmu_impl_ops to accommodate impl routines. Suggested-by:
Will Deacon <will@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/8fe9f3805568aabf771fc6706c116459016bf62d.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
For model-specific implementation, repurpose the acpi_smmu_get_options() to a wider acpi_smmu_acpi_probe_model(). A new model can add to the list in this new function. Suggested-by:
Will Deacon <will@kernel.org> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/79716299829aeab2e55b8c7932f2634b209bb4d5.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
The CMDQV extension in NVIDIA Tegra241 SoC only supports CS_NONE in the CS field of CMD_SYNC. Add a new SMMU option to accommodate that. Suggested-by:
Will Deacon <will@kernel.org> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/a3cb9bb2429fbae4a59f7ef517614d226763d717.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
The symbols __arm_smmu_cmdq_skip_err(), arm_smmu_init_one_queue(), and arm_smmu_cmdq_init() need to be used by the tegra241-cmdqv compilation unit in a following patch. Remove the static and put prototypes in the header. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/c4f2aa5f5f40a2e7c68b132c6d3171d6403de57a.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
So that this function can be used by other cmdqs than &smmu->cmdq only. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/e11a3c0bde172c9652c2946f12bc2ceed4c3a355.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
The CMDQV extension on NVIDIA Tegra241 SoC only supports CS_NONE in the CS field of CMD_SYNC, v.s. standard SMMU CMDQ. Pass in the cmdq pointer directly, so the function can identify a different cmdq implementation. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/723288287997b6dfbcd2a904d2c11e9b23f82250.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Nicolin Chen authored
The driver calls in different places the arm_smmu_get_cmdq() helper, and it's fine to do so since the helper always returns the single SMMU CMDQ. However, with NVIDIA CMDQV extension or SMMU ECMDQ, there can be multiple cmdqs in the system to select one from. And either case requires a batch of commands to be issued to the same cmdq. Thus, a cmdq has to be decided in the higher-level callers. Add a cmdq pointer in arm_smmu_cmdq_batch structure, and decide the cmdq when initializing the batch. Pass its pointer down to the bottom function. Update __arm_smmu_cmdq_issue_cmd() accordingly for single command issuers. Suggested-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2cbf5ddefb6ea611e48d67c642271bd24421eb21.1724970714.git.nicolinc@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 16 Aug, 2024 1 commit
-
-
Dan Carpenter authored
The arm_smmu_domain_alloc() function returns error pointers on error. It doesn't return NULL. Update the error checking to match. Fixes: 52acd7d8 ("iommu/arm-smmu-v3: Add support for domain_alloc_user fn") Signed-off-by:
Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9208cd0d-8105-40df-93e9-bdcdf0d55eec@stanley.mountainSigned-off-by:
Will Deacon <will@kernel.org>
-
- 03 Jul, 2024 4 commits
-
-
Kunkun Jiang authored
If io-pgtable quirk flag indicates support for hardware update of dirty state, enable HA/HD bits in the SMMU CD and also set the DBM bit in the page descriptor. Now report the dirty page tracking capability of SMMUv3 and select IOMMUFD_DRIVER for ARM_SMMU_V3 if IOMMUFD is enabled. Co-developed-by:
Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by:
Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by:
Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by:
Joao Martins <joao.m.martins@oracle.com> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-6-shameerali.kolothum.thodi@huawei.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Joao Martins authored
This provides all the infrastructure to enable dirty tracking if the hardware has the capability and domain alloc request for it. Also, add a device_iommu_capable() check in iommufd core for IOMMU_CAP_DIRTY_TRACKING before we request a user domain with dirty tracking support. Please note, we still report no support for IOMMU_CAP_DIRTY_TRACKING as it will finally be enabled in a subsequent patch. Signed-off-by:
Joao Martins <joao.m.martins@oracle.com> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-5-shameerali.kolothum.thodi@huawei.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jean-Philippe Brucker authored
If the SMMU supports it and the kernel was built with HTTU support, Probe support for Hardware Translation Table Update (HTTU) which is essentially to enable hardware update of access and dirty flags. Probe and set the smmu::features for Hardware Dirty and Hardware Access bits. This is in preparation, to enable it on the context descriptors of stage 1 format. Signed-off-by:
Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by:
Joao Martins <joao.m.martins@oracle.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-3-shameerali.kolothum.thodi@huawei.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Shameer Kolothum authored
This will be used by iommufd for allocating usr managed domains and is also required when we add support for iommufd based dirty tracking support. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-2-shameerali.kolothum.thodi@huawei.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 02 Jul, 2024 15 commits
-
-
Jason Gunthorpe authored
The top of the 2 level stream table is (at most) 128k entries big, and two high order allocations are required. One of __le64 which is programmed into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds the CPU pointer (3M). There is no reason to store the l2ptr_dma as nothing reads it. devm stores a copy of it and the DMA memory will be freed via devm mechanisms. span is a constant of 8+1. Remove both. This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of memory per iommu instance. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Mostafa Saleh <smostafa@google.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc (the list of DMA addresses for the L2 entries) is already zero'd. arm_smmu_init_l1_strtab() goes through and calls arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it again. Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from arm_smmu_init_strtab_2lvl to allocate the companion struct. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Mostafa Saleh <smostafa@google.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/1-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
The SVA cleanup made the SSID logic entirely general so all we need to do is call it with the correct cd table entry for a S1 domain. This is slightly tricky because of the ASID and how the locking works, the simple fix is to just update the ASID once we get the right locks. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
If the STE doesn't point to the CD table we can upgrade it by reprogramming the STE with the appropriate S1DSS. We may also need to turn on ATS at the same time. Keep track if the installed STE is pointing at the cd_table and the ATS state to trigger this path. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
The HW supports this, use the S1DSS bits to configure the behavior of SSID=0 which is the RID's translation. If SSID's are currently being used in the CD table then just update the S1DSS bits in the STE, remove the master_domain and leave ATS alone. For iommufd the driver design has a small problem that all the unused CD table entries are set with V=0 which will generate an event if VFIO userspace tries to use the CD entry. This patch extends this problem to include the RID as well if PASID is being used. For BLOCKED with used PASIDs the F_STREAM_DISABLED (STRTAB_STE_1_S1DSS_TERMINATE) event is generated on untagged traffic and a substream CD table entry with V=0 (removed pasid) will generate C_BAD_CD. Arguably there is no advantage to using S1DSS over the CD entry 0 with V=0. As we don't yet support PASID in iommufd this is a problem to resolve later, possibly by using EPD0 for unused CD table entries instead of V=0, and not using S1DSS for BLOCKED. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
This removes all the notifier de-duplication logic in the driver and relies on the core code to de-duplicate and allocate only one SVA domain per mm per smmu instance. This naturally gives a 1:1 relationship between SVA domain and mmu notifier. It is a significant simplication of the flow, as we end up with a single struct arm_smmu_domain for each MM and the invalidation can then be shifted to properly use the masters list like S1/S2 do. Remove all of the previous mmu_notifier, bond, shared cd, and cd refcount logic entirely. The logic here is tightly wound together with the unusued BTM support. Since the BTM logic requires holding all the iommu_domains in a global ASID xarray it conflicts with the design to have a single SVA domain per PASID, as multiple SMMU instances will need to have different domains. Following patches resolve this by making the ASID xarray per-instance instead of global. However, converting the BTM code over to this methodology requires many changes. Thus, since ARM_SMMU_FEAT_BTM is never enabled, remove the parts of the BTM support for ASID sharing that interact with SVA as well. A followup series is already working on fully enabling the BTM support, that requires iommufd's VIOMMU feature to bring in the KVM's VMID as well. It will come with an already written patch to bring back the ASID sharing using a per-instance ASID xarray. https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/ https://lore.kernel.org/linux-iommu/26-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia.com/Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Fill in the smmu_domain->devices list in the new struct arm_smmu_domain that SVA allocates. Keep track of every SSID and master that is using the domain reusing the logic for the RID attach. This is the first step to making the SVA invalidation follow the same design as S1/S2 invalidation. At present nothing will read this list. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Currently the SVA domain is a naked struct iommu_domain, allocate a struct arm_smmu_domain instead. This is necessary to be able to use the struct arm_master_domain mechanism. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Allow creating and managing arm_smmu_mater_domain's with a non-zero SSID through the arm_smmu_attach_*() family of functions. This triggers ATC invalidation for the correct SSID in PASID cases and tracks the per-attachment SSID in the struct arm_smmu_master_domain. Generalize arm_smmu_attach_remove() to be able to remove SSID's as well by ensuring the ATC for the PASID is flushed properly. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
We no longer need a master->sva_enable to control what attaches are allowed. Instead we can tell if the attach is legal based on the current configuration of the master. Keep track of the number of valid CD entries for SSID's in the cd_table and if the cd_table has been installed in the STE directly so we know what the configuration is. The attach logic is then made into: - SVA bind, check if the CD is installed - RID attach of S2, block if SSIDs are used - RID attach of IDENTITY/BLOCKING, block if SSIDs are used arm_smmu_set_pasid() is already checking if it is possible to setup a CD entry, at this patch it means the RID path already set a STE pointing at the CD table. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Prepare to allow a S1 domain to be attached to a PASID as well. Keep track of the SSID the domain is using on each master in the arm_smmu_master_domain. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
The core code allows the domain to be changed on the fly without a forced stop in BLOCKED/IDENTITY. In this flow the driver should just continually maintain the ATS with no change while the STE is updated. ATS relies on a linked list smmu_domain->devices to keep track of which masters have the domain programmed, but this list is also used by arm_smmu_share_asid(), unrelated to ats. Create two new functions to encapsulate this combined logic: arm_smmu_attach_prepare() <caller generates and sets the STE> arm_smmu_attach_commit() The two functions can sequence both enabling ATS and disabling across the STE store. Have every update of the STE use this sequence. Installing a S1/S2 domain always enables the ATS if the PCIe device supports it. The enable flow is now ordered differently to allow it to be hitless: 1) Add the master to the new smmu_domain->devices list 2) Program the STE 3) Enable ATS at PCIe 4) Remove the master from the old smmu_domain This flow ensures that invalidations to either domain will generate an ATC invalidation to the device while the STE is being switched. Thus we don't need to turn off the ATS anymore for correctness. The disable flow is the reverse: 1) Disable ATS at PCIe 2) Program the STE 3) Invalidate the ATC 4) Remove the master from the old smmu_domain Move the nr_ats_masters adjustments to be close to the list manipulations. It is a count of the number of ATS enabled masters currently in the list. This is stricly before and after the STE/CD are revised, and done under the list's spin_lock. This is part of the bigger picture to allow changing the RID domain while a PASID is in use. If a SVA PASID is relying on ATS to function then changing the RID domain cannot just temporarily toggle ATS off without also wrecking the SVA PASID. The new infrastructure here is organized so that the PASID attach/detach flows will make use of it as well in following patches. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
The next patch will need to store the same master twice (with different SSIDs), so allocate memory for each list element. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
Add arm_smmu_set_pasid()/arm_smmu_remove_pasid() which are to be used by callers that already constructed the arm_smmu_cd they wish to program. These functions will encapsulate the shared logic to setup a CD entry that will be shared by SVA and S1 domain cases. Prior fixes had already moved most of this logic up into __arm_smmu_sva_bind(), move it to it's final home. Following patches will relieve some of the remaining SVA restrictions: - The RID domain is a S1 domain and has already setup the STE to point to the CD table - The programmed PASID is the mm_get_enqcmd_pasid() - Nothing changes while SVA is running (sva_enable) SVA invalidation will still iterate over the S1 domain's master list, later patches will resolve that. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
Jason Gunthorpe authored
This allows the driver the receive the mm and always a device during allocation. Later patches need this to properly setup the notifier when the domain is first allocated. Remove ops->domain_alloc() as SVA was the only remaining purpose. Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by:
Michael Shavit <mshavit@google.com> Reviewed-by:
Nicolin Chen <nicolinc@nvidia.com> Reviewed-by:
Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 05 Jun, 2024 1 commit
-
-
Mostafa Saleh authored
Static checker is complaining about the ASID possibly set uninitialized. This only happens in case of error and this value would be ignored anyway. A simple fix would be just to initialize the local variable to zero, this path will only be reached on the first attach to a domain where the CD is already initialized to zero. This avoids having to bloat the function with an error path. Closes: https://lore.kernel.org/linux-iommu/849e3d77-0a3c-43c4-878d-a0e061c8cd61@moroto.mountain/T/#uReported-by:
Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by:
Mostafa Saleh <smostafa@google.com> Fixes: 04905c17 ("iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()") Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240604185218.2602058-1-smostafa@google.comSigned-off-by:
Will Deacon <will@kernel.org>
-
- 10 May, 2024 1 commit
-
-
Jason Gunthorpe authored
It turns out kconfig has problems ensuring the SMMU module and the KUNIT module are consistently y/m to allow linking. It will permit KUNIT to be a module while SMMU is built in. Also, Fedora apparently enables kunit on production kernels. So, put the entire kunit in its own module using the VISIBLE_IF_KUNIT/EXPORT_SYMBOL_IF_KUNIT machinery. This keeps it out of vmlinus on Fedora and makes the kconfig work in the normal way. There is no cost if kunit is disabled. Fixes: 56e1a4cc ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry") Reported-by:
Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/all/aeea8546-5bce-4c51-b506-5d2008e52fef@leemhuis.infoSigned-off-by:
Jason Gunthorpe <jgg@nvidia.com> Tested-by:
Thorsten Leemhuis <linux@leemhuis.info> Acked-by:
Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/0-v1-24cba6c0f404+2ae-smmu_kunit_module_jgg@nvidia.comSigned-off-by:
Joerg Roedel <jroedel@suse.de>
-