Commits · 1554240ff864da9eb99e2233d3faf191c567c47a · Kirill Smelkov / linux

23 Aug, 2019 1 commit

Merge branches 'for-joerg/arm-smmu/smmu-v2' and 'for-joerg/arm-smmu/smmu-v3'... · 1554240f

Will Deacon authored Aug 23, 2019

Merge branches 'for-joerg/arm-smmu/smmu-v2' and 'for-joerg/arm-smmu/smmu-v3' into for-joerg/arm-smmu/updates

* for-joerg/arm-smmu/smmu-v2:
  Refactoring to allow for implementation-specific hooks in 'arm-smmu-impl.c'

* for-joerg/arm-smmu/smmu-v3:
  Support for deferred TLB invalidation and batching of commands
  Rework ATC invalidation for ATS-enabled PCIe masters

1554240f

22 Aug, 2019 2 commits

Revert "iommu/arm-smmu-v3: Disable detection of ATS and PRI" · a91bcc2b

Will Deacon authored Aug 21, 2019

This reverts commit b5e86196.

Now that ATC invalidation is performed in the correct places and without
incurring a locking overhead for non-ATS systems, we can re-enable the
corresponding SMMU feature detection.
Signed-off-by: Will Deacon <will@kernel.org>

a91bcc2b

iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS · cdb8a3c3

Will Deacon authored Aug 20, 2019

When ATS is not in use, we can avoid taking the 'devices_lock' for the
domain on the invalidation path by simply caching the number of ATS
masters currently attached. The fiddly part is handling a concurrent
->attach() of an ATS-enabled master to a domain that is being
invalidated, but we can handle this using an 'smp_mb()' to ensure that
our check of the count is ordered after completion of our prior TLB
invalidation.

This also makes our ->attach() and ->detach() flows symmetric wrt ATS
interactions.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

cdb8a3c3

21 Aug, 2019 6 commits

iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs · 353e3cf8

Will Deacon authored Aug 20, 2019

When invalidating the ATC for an PCIe endpoint using ATS, we must take
care to complete invalidation of the main SMMU TLBs beforehand, otherwise
the device could immediately repopulate its ATC with stale translations.

Hooking the ATC invalidation into ->unmap() as we currently do does the
exact opposite: it ensures that the ATC is invalidated *before*  the
main TLBs, which is bogus.

Move ATC invalidation into the actual (leaf) invalidation routines so
that it is always called after completing main TLB invalidation.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

353e3cf8

iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters · bfff88ec

Will Deacon authored Aug 20, 2019

To prevent any potential issues arising from speculative Address
Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS
enabling/disabling logic so that we enable ATS at the SMMU before we
enable it at the endpoint, and disable things in the opposite order.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

bfff88ec

iommu/arm-smmu-v3: Don't issue CMD_SYNC for zero-length invalidations · 7314ca86

Will Deacon authored Aug 21, 2019

Calling arm_smmu_tlb_inv_range() with a size of zero, perhaps due to
an empty 'iommu_iotlb_gather' structure, should be a NOP. Elide the
CMD_SYNC when there is no invalidation to be performed.
Signed-off-by: Will Deacon <will@kernel.org>

7314ca86

iommu/arm-smmu-v3: Remove boolean bitfield for 'ats_enabled' flag · f75d8e33

Will Deacon authored Aug 20, 2019

There's really no need for this to be a bitfield, particularly as we
don't have bitwise addressing on arm64.
Signed-off-by: Will Deacon <will@kernel.org>

f75d8e33

iommu/arm-smmu-v3: Disable detection of ATS and PRI · b5e86196

Will Deacon authored Aug 21, 2019

Detecting the ATS capability of the SMMU at probe time introduces a
spinlock into the ->unmap() fast path, even when ATS is not actually
in use. Furthermore, the ATC invalidation that exists is broken, as it
occurs before invalidation of the main SMMU TLB which leaves a window
where the ATC can be repopulated with stale entries.

Given that ATS is both a new feature and a specialist sport, disable it
for now whilst we fix it properly in subsequent patches. Since PRI
requires ATS, disable that too.

Cc: <stable@vger.kernel.org>
Fixes: 9ce27afc ("iommu/arm-smmu-v3: Add support for PCI ATS")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

b5e86196

iommu/arm-smmu-v3: Document ordering guarantees of command insertion · 05cbaf4d

Will Deacon authored Aug 20, 2019

It turns out that we've always relied on some subtle ordering guarantees
when inserting commands into the SMMUv3 command queue. With the recent
changes to elide locking when possible, these guarantees become more
subtle and even more important.

Add a comment documented the barrier semantics of command insertion so
that we don't have to derive the behaviour from scratch each time it
comes up on the list.
Signed-off-by: Will Deacon <will@kernel.org>

05cbaf4d

20 Aug, 2019 2 commits

iommu/arm-smmu: Ensure 64-bit I/O accessors are available on 32-bit CPU · d720e641

Robin Murphy authored Aug 20, 2019

As part of the grand SMMU driver refactoring effort, the I/O register
accessors were moved into 'arm-smmu.h' in commit 6d7dff62
("iommu/arm-smmu: Move Secure access quirk to implementation").

On 32-bit architectures (such as ARM), the 64-bit accessors are defined
in 'linux/io-64-nonatomic-hi-lo.h', so include this header to fix the
build.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

d720e641

iommu/arm-smmu: Make private implementation details static · 4b67f1dd

Will Deacon authored Aug 20, 2019

Many of the device-specific implementation details in 'arm-smmu-impl.c'
are exposed to other compilation units. Whilst we may require this in
the future, let's make it all 'static' for now so that we can expose
things on a case-by-case basic.
Signed-off-by: Will Deacon <will@kernel.org>

4b67f1dd

19 Aug, 2019 17 commits

iommu/arm-smmu: Add context init implementation hook · ba7e4a08

Robin Murphy authored Aug 15, 2019

Allocating and initialising a context for a domain is another point
where certain implementations are known to want special behaviour.
Currently the other half of the Cavium workaround comes into play here,
so let's finish the job to get the whole thing right out of the way.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

ba7e4a08

iommu/arm-smmu: Add reset implementation hook · 62b993a3

Robin Murphy authored Aug 15, 2019

Reset is an activity rife with implementation-defined poking. Add a
corresponding hook, and use it to encapsulate the existing MMU-500
details.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

62b993a3

iommu/arm-smmu: Add configuration implementation hook · 3995e186

Robin Murphy authored Aug 15, 2019

Probing the ID registers and setting up the SMMU configuration is an
area where overrides and workarounds may well be needed. Indeed, the
Cavium workaround detection lives there at the moment, so let's break
that out.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

3995e186

iommu/arm-smmu: Move Secure access quirk to implementation · 6d7dff62

Robin Murphy authored Aug 15, 2019

Move detection of the Secure access quirk to its new home, trimming it
down in the process - time has proven that boolean DT flags are neither
ideal nor necessarily sufficient, so it's highly unlikely we'll ever add
more, let alone enough to justify the frankly overengineered parsing
machinery.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

6d7dff62

iommu/arm-smmu: Add implementation infrastructure · fc058d37

Robin Murphy authored Aug 15, 2019

Add some nascent infrastructure for handling implementation-specific
details outside the flow of the architectural code. This will allow us
to keep mutually-incompatible vendor-specific hooks in their own files
where the respective interested parties can maintain them with minimal
chance of conflicts. As somewhat of a template, we'll start with a
general place to collect the relatively trivial existing quirks.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

fc058d37

iommu/arm-smmu: Rename arm-smmu-regs.h · c5fc6488

Robin Murphy authored Aug 15, 2019

We're about to start using it for more than just register definitions,
so generalise the name.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

c5fc6488

iommu/arm-smmu: Abstract GR0 accesses · 00320ce6

Robin Murphy authored Aug 15, 2019

Clean up the remaining accesses to GR0 registers, so that everything is
now neatly abstracted. This folds up the Non-Secure alias quirk as the
first step towards moving it out of the way entirely. Although GR0 does
technically contain some 64-bit registers (sGFAR and the weird SMMUv2
HYPC and MONC stuff), they're not ones we have any need to access.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

00320ce6

iommu/arm-smmu: Abstract context bank accesses · 19713fd4

Robin Murphy authored Aug 15, 2019

Context bank accesses are fiddly enough to deserve a number of extra
helpers to keep the callsites looking sane, even though there are only
one or two of each.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

19713fd4

iommu/arm-smmu: Abstract GR1 accesses · aadbf214

Robin Murphy authored Aug 15, 2019

Introduce some register access abstractions which we will later use to
encapsulate various quirks. GR1 is the easiest page to start with.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

aadbf214

iommu/arm-smmu: Get rid of weird "atomic" write · 61005762

Robin Murphy authored Aug 15, 2019

The smmu_write_atomic_lq oddity made some sense when the context
format was effectively tied to CONFIG_64BIT, but these days it's
simpler to just pick an explicit access size based on the format
for the one-and-a-half times we actually care.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

61005762

iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync() · 71e8a8cd

Robin Murphy authored Aug 15, 2019

Since we now use separate iommu_gather_ops for stage 1 and stage 2
contexts, we may as well divide up the monolithic callback into its
respective stage 1 and stage 2 parts.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

71e8a8cd

iommu/arm-smmu: Rework cb_base handling · 490325e0

Robin Murphy authored Aug 15, 2019

To keep register-access quirks manageable, we want to structure things
to avoid needing too many individual overrides. It seems fairly clean to
have a single interface which handles both global and context registers
in terms of the architectural pages, so the first preparatory step is to
rework cb_base into a page number rather than an absolute address.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

490325e0

iommu/arm-smmu: Convert context bank registers to bitfields · 620565a7

Robin Murphy authored Aug 15, 2019

Finish the final part of the job, once again updating some names to
match the current spec.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

620565a7

iommu/arm-smmu: Convert GR1 registers to bitfields · 5114e96c

Robin Murphy authored Aug 15, 2019

As for GR0, use the bitfield helpers to make GR1 usage a little cleaner,
and use it as an opportunity to audit and tidy the definitions. This
tweaks the handling of CBAR types to match what we did for S2CR a while
back, and fixes a couple of names which didn't quite match the latest
architecture spec (IHI0062D.c).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

5114e96c

iommu/arm-smmu: Convert GR0 registers to bitfields · 0caf5f4e

Robin Murphy authored Aug 15, 2019

FIELD_PREP remains a terrible name, but the overall simplification will
make further work on this stuff that much more manageable. This also
serves as an audit of the header, wherein we can impose a consistent
grouping and ordering of the offset and field definitions
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

0caf5f4e

iommu/qcom: Mask TLBI addresses correctly · a5b396ce

Robin Murphy authored Aug 15, 2019

As with arm-smmu from whence this code was borrowed, the IOVAs passed in
here happen to be at least page-aligned anyway, but still; oh dear.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

a5b396ce

iommu/arm-smmu: Mask TLBI address correctly · 353b3250

Robin Murphy authored Aug 15, 2019

The less said about "~12UL" the better. Oh dear.

We get away with it due to calling constraints that mean IOVAs are
implicitly at least page-aligned to begin with, but still; oh dear.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

353b3250

08 Aug, 2019 2 commits

iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync() · 2af2e72b

Will Deacon authored Jul 02, 2019

Update the iommu_iotlb_gather structure passed to ->tlb_add_page() and
use this information to defer all TLB invalidation until ->iotlb_sync().
This drastically reduces contention on the command queue, since we can
insert our commands in batches rather than one-by-one.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

2af2e72b

iommu/arm-smmu-v3: Reduce contention during command-queue insertion · 587e6c10

Will Deacon authored Jul 02, 2019

The SMMU command queue is a bottleneck in large systems, thanks to the
spin_lock which serialises accesses from all CPUs to the single queue
supported by the hardware.

Attempt to improve this situation by moving to a new algorithm for
inserting commands into the queue, which is lock-free on the fast-path.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

587e6c10

29 Jul, 2019 10 commits

iommu/arm-smmu-v3: Operate directly on low-level queue where possible · 7c288a5b

Will Deacon authored Jul 02, 2019

In preparation for rewriting the command queue insertion code to use a
new algorithm, rework many of our queue macro accessors and manipulation
functions so that they operate on the arm_smmu_ll_queue structure where
possible. This will allow us to call these helpers on local variables
without having to construct a full-blown arm_smmu_queue on the stack.

No functional change.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

7c288a5b

iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue · 52be8637

Will Deacon authored Jul 02, 2019

In preparation for rewriting the command queue insertion code to use a
new algorithm, introduce a new arm_smmu_ll_queue structure which contains
only the information necessary to perform queue arithmetic for a queue
and will later be extended so that we can perform complex atomic
manipulation on some of the fields.

No functional change.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

52be8637

iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro · 8a073da0

Will Deacon authored Jul 02, 2019

The Q_OVF macro doesn't need to access the arm_smmu_queue structure, so
drop the unused macro argument.

No functional change.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

8a073da0

iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes · 2a8868f1

Will Deacon authored Jul 02, 2019

In preparation for rewriting the command queue insertion code to use a
new algorithm, separate the software and hardware views of the prod and
cons indexes so that manipulating the software state doesn't
automatically update the hardware state at the same time.

No functional change.
Tested-by: Ganapatrao Kulkarni  <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>

2a8868f1

iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() · 3951c41a

Will Deacon authored Jul 02, 2019

With all the pieces in place, we can finally propagate the
iommu_iotlb_gather structure from the call to unmap() down to the IOMMU
drivers' implementation of ->tlb_add_page(). Currently everybody ignores
it, but the machinery is now there to defer invalidation.
Signed-off-by: Will Deacon <will@kernel.org>

3951c41a

iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() · a2d3a382

Will Deacon authored Jul 02, 2019

Update the io-pgtable ->unmap() function to take an iommu_iotlb_gather
pointer as an argument, and update the callers as appropriate.
Signed-off-by: Will Deacon <will@kernel.org>

a2d3a382

iommu/io-pgtable: Remove unused ->tlb_sync() callback · e953f7f2

Will Deacon authored Jul 02, 2019

The ->tlb_sync() callback is no longer used, so it can be removed.
Signed-off-by: Will Deacon <will@kernel.org>

e953f7f2

iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() · abfd6fe0

Will Deacon authored Jul 02, 2019

The ->tlb_add_flush() callback in the io-pgtable API now looks a bit
silly:

  - It takes a size and a granule, which are always the same
  - It takes a 'bool leaf', which is always true
  - It only ever flushes a single page

With that in mind, replace it with an optional ->tlb_add_page() callback
that drops the useless parameters.
Signed-off-by: Will Deacon <will@kernel.org>

abfd6fe0

iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() · 10b7a7d9

Will Deacon authored Jul 02, 2019

Now that all IOMMU drivers using the io-pgtable API implement the
->tlb_flush_walk() and ->tlb_flush_leaf() callbacks, we can use them in
the io-pgtable code instead of ->tlb_add_flush() immediately followed by
->tlb_sync().
Signed-off-by: Will Deacon <will@kernel.org>

10b7a7d9

iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers · 05aed941

Will Deacon authored Jul 02, 2019

Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers using the
io-pgtable API so that we can start making use of them in the page-table
code. For now, they can just wrap the implementations of ->tlb_add_flush
and ->tlb_sync pending future optimisation in each driver.
Signed-off-by: Will Deacon <will@kernel.org>

05aed941