- 23 Aug, 2019 1 commit
-
-
Will Deacon authored
Merge branches 'for-joerg/arm-smmu/smmu-v2' and 'for-joerg/arm-smmu/smmu-v3' into for-joerg/arm-smmu/updates * for-joerg/arm-smmu/smmu-v2: Refactoring to allow for implementation-specific hooks in 'arm-smmu-impl.c' * for-joerg/arm-smmu/smmu-v3: Support for deferred TLB invalidation and batching of commands Rework ATC invalidation for ATS-enabled PCIe masters
-
- 22 Aug, 2019 2 commits
-
-
Will Deacon authored
This reverts commit b5e86196. Now that ATC invalidation is performed in the correct places and without incurring a locking overhead for non-ATS systems, we can re-enable the corresponding SMMU feature detection. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
When ATS is not in use, we can avoid taking the 'devices_lock' for the domain on the invalidation path by simply caching the number of ATS masters currently attached. The fiddly part is handling a concurrent ->attach() of an ATS-enabled master to a domain that is being invalidated, but we can handle this using an 'smp_mb()' to ensure that our check of the count is ordered after completion of our prior TLB invalidation. This also makes our ->attach() and ->detach() flows symmetric wrt ATS interactions. Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
- 21 Aug, 2019 6 commits
-
-
Will Deacon authored
When invalidating the ATC for an PCIe endpoint using ATS, we must take care to complete invalidation of the main SMMU TLBs beforehand, otherwise the device could immediately repopulate its ATC with stale translations. Hooking the ATC invalidation into ->unmap() as we currently do does the exact opposite: it ensures that the ATC is invalidated *before* the main TLBs, which is bogus. Move ATC invalidation into the actual (leaf) invalidation routines so that it is always called after completing main TLB invalidation. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
To prevent any potential issues arising from speculative Address Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS enabling/disabling logic so that we enable ATS at the SMMU before we enable it at the endpoint, and disable things in the opposite order. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Calling arm_smmu_tlb_inv_range() with a size of zero, perhaps due to an empty 'iommu_iotlb_gather' structure, should be a NOP. Elide the CMD_SYNC when there is no invalidation to be performed. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
There's really no need for this to be a bitfield, particularly as we don't have bitwise addressing on arm64. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Detecting the ATS capability of the SMMU at probe time introduces a spinlock into the ->unmap() fast path, even when ATS is not actually in use. Furthermore, the ATC invalidation that exists is broken, as it occurs before invalidation of the main SMMU TLB which leaves a window where the ATC can be repopulated with stale entries. Given that ATS is both a new feature and a specialist sport, disable it for now whilst we fix it properly in subsequent patches. Since PRI requires ATS, disable that too. Cc: <stable@vger.kernel.org> Fixes: 9ce27afc ("iommu/arm-smmu-v3: Add support for PCI ATS") Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
It turns out that we've always relied on some subtle ordering guarantees when inserting commands into the SMMUv3 command queue. With the recent changes to elide locking when possible, these guarantees become more subtle and even more important. Add a comment documented the barrier semantics of command insertion so that we don't have to derive the behaviour from scratch each time it comes up on the list. Signed-off-by: Will Deacon <will@kernel.org>
-
- 20 Aug, 2019 2 commits
-
-
Robin Murphy authored
As part of the grand SMMU driver refactoring effort, the I/O register accessors were moved into 'arm-smmu.h' in commit 6d7dff62 ("iommu/arm-smmu: Move Secure access quirk to implementation"). On 32-bit architectures (such as ARM), the 64-bit accessors are defined in 'linux/io-64-nonatomic-hi-lo.h', so include this header to fix the build. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Many of the device-specific implementation details in 'arm-smmu-impl.c' are exposed to other compilation units. Whilst we may require this in the future, let's make it all 'static' for now so that we can expose things on a case-by-case basic. Signed-off-by: Will Deacon <will@kernel.org>
-
- 19 Aug, 2019 17 commits
-
-
Robin Murphy authored
Allocating and initialising a context for a domain is another point where certain implementations are known to want special behaviour. Currently the other half of the Cavium workaround comes into play here, so let's finish the job to get the whole thing right out of the way. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Reset is an activity rife with implementation-defined poking. Add a corresponding hook, and use it to encapsulate the existing MMU-500 details. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Probing the ID registers and setting up the SMMU configuration is an area where overrides and workarounds may well be needed. Indeed, the Cavium workaround detection lives there at the moment, so let's break that out. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Move detection of the Secure access quirk to its new home, trimming it down in the process - time has proven that boolean DT flags are neither ideal nor necessarily sufficient, so it's highly unlikely we'll ever add more, let alone enough to justify the frankly overengineered parsing machinery. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Add some nascent infrastructure for handling implementation-specific details outside the flow of the architectural code. This will allow us to keep mutually-incompatible vendor-specific hooks in their own files where the respective interested parties can maintain them with minimal chance of conflicts. As somewhat of a template, we'll start with a general place to collect the relatively trivial existing quirks. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
We're about to start using it for more than just register definitions, so generalise the name. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Clean up the remaining accesses to GR0 registers, so that everything is now neatly abstracted. This folds up the Non-Secure alias quirk as the first step towards moving it out of the way entirely. Although GR0 does technically contain some 64-bit registers (sGFAR and the weird SMMUv2 HYPC and MONC stuff), they're not ones we have any need to access. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Context bank accesses are fiddly enough to deserve a number of extra helpers to keep the callsites looking sane, even though there are only one or two of each. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Introduce some register access abstractions which we will later use to encapsulate various quirks. GR1 is the easiest page to start with. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
The smmu_write_atomic_lq oddity made some sense when the context format was effectively tied to CONFIG_64BIT, but these days it's simpler to just pick an explicit access size based on the format for the one-and-a-half times we actually care. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Since we now use separate iommu_gather_ops for stage 1 and stage 2 contexts, we may as well divide up the monolithic callback into its respective stage 1 and stage 2 parts. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
To keep register-access quirks manageable, we want to structure things to avoid needing too many individual overrides. It seems fairly clean to have a single interface which handles both global and context registers in terms of the architectural pages, so the first preparatory step is to rework cb_base into a page number rather than an absolute address. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
Finish the final part of the job, once again updating some names to match the current spec. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
As for GR0, use the bitfield helpers to make GR1 usage a little cleaner, and use it as an opportunity to audit and tidy the definitions. This tweaks the handling of CBAR types to match what we did for S2CR a while back, and fixes a couple of names which didn't quite match the latest architecture spec (IHI0062D.c). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
FIELD_PREP remains a terrible name, but the overall simplification will make further work on this stuff that much more manageable. This also serves as an audit of the header, wherein we can impose a consistent grouping and ordering of the offset and field definitions Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
As with arm-smmu from whence this code was borrowed, the IOVAs passed in here happen to be at least page-aligned anyway, but still; oh dear. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
The less said about "~12UL" the better. Oh dear. We get away with it due to calling constraints that mean IOVAs are implicitly at least page-aligned to begin with, but still; oh dear. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
-
- 08 Aug, 2019 2 commits
-
-
Will Deacon authored
Update the iommu_iotlb_gather structure passed to ->tlb_add_page() and use this information to defer all TLB invalidation until ->iotlb_sync(). This drastically reduces contention on the command queue, since we can insert our commands in batches rather than one-by-one. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
The SMMU command queue is a bottleneck in large systems, thanks to the spin_lock which serialises accesses from all CPUs to the single queue supported by the hardware. Attempt to improve this situation by moving to a new algorithm for inserting commands into the queue, which is lock-free on the fast-path. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
- 29 Jul, 2019 10 commits
-
-
Will Deacon authored
In preparation for rewriting the command queue insertion code to use a new algorithm, rework many of our queue macro accessors and manipulation functions so that they operate on the arm_smmu_ll_queue structure where possible. This will allow us to call these helpers on local variables without having to construct a full-blown arm_smmu_queue on the stack. No functional change. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
In preparation for rewriting the command queue insertion code to use a new algorithm, introduce a new arm_smmu_ll_queue structure which contains only the information necessary to perform queue arithmetic for a queue and will later be extended so that we can perform complex atomic manipulation on some of the fields. No functional change. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
The Q_OVF macro doesn't need to access the arm_smmu_queue structure, so drop the unused macro argument. No functional change. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
In preparation for rewriting the command queue insertion code to use a new algorithm, separate the software and hardware views of the prod and cons indexes so that manipulating the software state doesn't automatically update the hardware state at the same time. No functional change. Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
With all the pieces in place, we can finally propagate the iommu_iotlb_gather structure from the call to unmap() down to the IOMMU drivers' implementation of ->tlb_add_page(). Currently everybody ignores it, but the machinery is now there to defer invalidation. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Update the io-pgtable ->unmap() function to take an iommu_iotlb_gather pointer as an argument, and update the callers as appropriate. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
The ->tlb_sync() callback is no longer used, so it can be removed. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
The ->tlb_add_flush() callback in the io-pgtable API now looks a bit silly: - It takes a size and a granule, which are always the same - It takes a 'bool leaf', which is always true - It only ever flushes a single page With that in mind, replace it with an optional ->tlb_add_page() callback that drops the useless parameters. Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Now that all IOMMU drivers using the io-pgtable API implement the ->tlb_flush_walk() and ->tlb_flush_leaf() callbacks, we can use them in the io-pgtable code instead of ->tlb_add_flush() immediately followed by ->tlb_sync(). Signed-off-by: Will Deacon <will@kernel.org>
-
Will Deacon authored
Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers using the io-pgtable API so that we can start making use of them in the page-table code. For now, they can just wrap the implementations of ->tlb_add_flush and ->tlb_sync pending future optimisation in each driver. Signed-off-by: Will Deacon <will@kernel.org>
-