Commit 9044adca authored by Michael Ellerman's avatar Michael Ellerman

Merge branch 'topic/ppc-kvm' into next

Merge our ppc-kvm topic branch to bring in the Ultravisor support
patches.
parents 07aa1e78 68e0aa8e
==========================
ELF Note PowerPC Namespace
==========================
The PowerPC namespace in an ELF Note of the kernel binary is used to store
capabilities and information which can be used by a bootloader or userland.
Types and Descriptors
---------------------
The types to be used with the "PowerPC" namesapce are defined in the
include/uapi/asm/elfnote.h
1) PPC_ELFNOTE_CAPABILITIES
Define the capabilities supported/required by the kernel. This type uses a
bitmap as "descriptor" field. Each bit is described below:
- Ultravisor-capable bit (PowerNV only).
#define PPCCAP_ULTRAVISOR_BIT (1 << 0)
Indicate that the powerpc kernel binary knows how to run in an
ultravisor-enabled system.
In an ultravisor-enabled system, some machine resources are now controlled
by the ultravisor. If the kernel is not ultravisor-capable, but it ends up
being run on a machine with ultravisor, the kernel will probably crash
trying to access ultravisor resources. For instance, it may crash in early
boot trying to set the partition table entry 0.
In an ultravisor-enabled system, a bootloader could warn the user or prevent
the kernel from being run if the PowerPC ultravisor capability doesn't exist
or the Ultravisor-capable bit is not set.
References
----------
arch/powerpc/include/asm/elfnote.h
arch/powerpc/kernel/note.S
.. SPDX-License-Identifier: GPL-2.0
.. _ultravisor:
============================
Protected Execution Facility
============================
.. contents::
:depth: 3
.. sectnum::
:depth: 3
Protected Execution Facility
############################
Protected Execution Facility (PEF) is an architectural change for
POWER 9 that enables Secure Virtual Machines (SVMs). DD2.3 chips
(PVR=0x004e1203) or greater will be PEF-capable. A new ISA release
will include the PEF RFC02487 changes.
When enabled, PEF adds a new higher privileged mode, called Ultravisor
mode, to POWER architecture. Along with the new mode there is new
firmware called the Protected Execution Ultravisor (or Ultravisor
for short). Ultravisor mode is the highest privileged mode in POWER
architecture.
+------------------+
| Privilege States |
+==================+
| Problem |
+------------------+
| Supervisor |
+------------------+
| Hypervisor |
+------------------+
| Ultravisor |
+------------------+
PEF protects SVMs from the hypervisor, privileged users, and other
VMs in the system. SVMs are protected while at rest and can only be
executed by an authorized machine. All virtual machines utilize
hypervisor services. The Ultravisor filters calls between the SVMs
and the hypervisor to assure that information does not accidentally
leak. All hypercalls except H_RANDOM are reflected to the hypervisor.
H_RANDOM is not reflected to prevent the hypervisor from influencing
random values in the SVM.
To support this there is a refactoring of the ownership of resources
in the CPU. Some of the resources which were previously hypervisor
privileged are now ultravisor privileged.
Hardware
========
The hardware changes include the following:
* There is a new bit in the MSR that determines whether the current
process is running in secure mode, MSR(S) bit 41. MSR(S)=1, process
is in secure mode, MSR(s)=0 process is in normal mode.
* The MSR(S) bit can only be set by the Ultravisor.
* HRFID cannot be used to set the MSR(S) bit. If the hypervisor needs
to return to a SVM it must use an ultracall. It can determine if
the VM it is returning to is secure.
* There is a new Ultravisor privileged register, SMFCTRL, which has an
enable/disable bit SMFCTRL(E).
* The privilege of a process is now determined by three MSR bits,
MSR(S, HV, PR). In each of the tables below the modes are listed
from least privilege to highest privilege. The higher privilege
modes can access all the resources of the lower privilege modes.
**Secure Mode MSR Settings**
+---+---+---+---------------+
| S | HV| PR|Privilege |
+===+===+===+===============+
| 1 | 0 | 1 | Problem |
+---+---+---+---------------+
| 1 | 0 | 0 | Privileged(OS)|
+---+---+---+---------------+
| 1 | 1 | 0 | Ultravisor |
+---+---+---+---------------+
| 1 | 1 | 1 | Reserved |
+---+---+---+---------------+
**Normal Mode MSR Settings**
+---+---+---+---------------+
| S | HV| PR|Privilege |
+===+===+===+===============+
| 0 | 0 | 1 | Problem |
+---+---+---+---------------+
| 0 | 0 | 0 | Privileged(OS)|
+---+---+---+---------------+
| 0 | 1 | 0 | Hypervisor |
+---+---+---+---------------+
| 0 | 1 | 1 | Problem (Host)|
+---+---+---+---------------+
* Memory is partitioned into secure and normal memory. Only processes
that are running in secure mode can access secure memory.
* The hardware does not allow anything that is not running secure to
access secure memory. This means that the Hypervisor cannot access
the memory of the SVM without using an ultracall (asking the
Ultravisor). The Ultravisor will only allow the hypervisor to see
the SVM memory encrypted.
* I/O systems are not allowed to directly address secure memory. This
limits the SVMs to virtual I/O only.
* The architecture allows the SVM to share pages of memory with the
hypervisor that are not protected with encryption. However, this
sharing must be initiated by the SVM.
* When a process is running in secure mode all hypercalls
(syscall lev=1) go to the Ultravisor.
* When a process is in secure mode all interrupts go to the
Ultravisor.
* The following resources have become Ultravisor privileged and
require an Ultravisor interface to manipulate:
* Processor configurations registers (SCOMs).
* Stop state information.
* The debug registers CIABR, DAWR, and DAWRX when SMFCTRL(D) is set.
If SMFCTRL(D) is not set they do not work in secure mode. When set,
reading and writing requires an Ultravisor call, otherwise that
will cause a Hypervisor Emulation Assistance interrupt.
* PTCR and partition table entries (partition table is in secure
memory). An attempt to write to PTCR will cause a Hypervisor
Emulation Assitance interrupt.
* LDBAR (LD Base Address Register) and IMC (In-Memory Collection)
non-architected registers. An attempt to write to them will cause a
Hypervisor Emulation Assistance interrupt.
* Paging for an SVM, sharing of memory with Hypervisor for an SVM.
(Including Virtual Processor Area (VPA) and virtual I/O).
Software/Microcode
==================
The software changes include:
* SVMs are created from normal VM using (open source) tooling supplied
by IBM.
* All SVMs start as normal VMs and utilize an ultracall, UV_ESM
(Enter Secure Mode), to make the transition.
* When the UV_ESM ultracall is made the Ultravisor copies the VM into
secure memory, decrypts the verification information, and checks the
integrity of the SVM. If the integrity check passes the Ultravisor
passes control in secure mode.
* The verification information includes the pass phrase for the
encrypted disk associated with the SVM. This pass phrase is given
to the SVM when requested.
* The Ultravisor is not involved in protecting the encrypted disk of
the SVM while at rest.
* For external interrupts the Ultravisor saves the state of the SVM,
and reflects the interrupt to the hypervisor for processing.
For hypercalls, the Ultravisor inserts neutral state into all
registers not needed for the hypercall then reflects the call to
the hypervisor for processing. The H_RANDOM hypercall is performed
by the Ultravisor and not reflected.
* For virtual I/O to work bounce buffering must be done.
* The Ultravisor uses AES (IAPM) for protection of SVM memory. IAPM
is a mode of AES that provides integrity and secrecy concurrently.
* The movement of data between normal and secure pages is coordinated
with the Ultravisor by a new HMM plug-in in the Hypervisor.
The Ultravisor offers new services to the hypervisor and SVMs. These
are accessed through ultracalls.
Terminology
===========
* Hypercalls: special system calls used to request services from
Hypervisor.
* Normal memory: Memory that is accessible to Hypervisor.
* Normal page: Page backed by normal memory and available to
Hypervisor.
* Shared page: A page backed by normal memory and available to both
the Hypervisor/QEMU and the SVM (i.e page has mappings in SVM and
Hypervisor/QEMU).
* Secure memory: Memory that is accessible only to Ultravisor and
SVMs.
* Secure page: Page backed by secure memory and only available to
Ultravisor and SVM.
* SVM: Secure Virtual Machine.
* Ultracalls: special system calls used to request services from
Ultravisor.
Ultravisor calls API
####################
This section describes Ultravisor calls (ultracalls) needed to
support Secure Virtual Machines (SVM)s and Paravirtualized KVM. The
ultracalls allow the SVMs and Hypervisor to request services from the
Ultravisor such as accessing a register or memory region that can only
be accessed when running in Ultravisor-privileged mode.
The specific service needed from an ultracall is specified in register
R3 (the first parameter to the ultracall). Other parameters to the
ultracall, if any, are specified in registers R4 through R12.
Return value of all ultracalls is in register R3. Other output values
from the ultracall, if any, are returned in registers R4 through R12.
The only exception to this register usage is the ``UV_RETURN``
ultracall described below.
Each ultracall returns specific error codes, applicable in the context
of the ultracall. However, like with the PowerPC Architecture Platform
Reference (PAPR), if no specific error code is defined for a
particular situation, then the ultracall will fallback to an erroneous
parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc
depending on the ultracall parameter that may have caused the error.
Some ultracalls involve transferring a page of data between Ultravisor
and Hypervisor. Secure pages that are transferred from secure memory
to normal memory may be encrypted using dynamically generated keys.
When the secure pages are transferred back to secure memory, they may
be decrypted using the same dynamically generated keys. Generation and
management of these keys will be covered in a separate document.
For now this only covers ultracalls currently implemented and being
used by Hypervisor and SVMs but others can be added here when it
makes sense.
The full specification for all hypercalls/ultracalls will eventually
be made available in the public/OpenPower version of the PAPR
specification.
**Note**
If PEF is not enabled, the ultracalls will be redirected to the
Hypervisor which must handle/fail the calls.
Ultracalls used by Hypervisor
=============================
This section describes the virtual memory management ultracalls used
by the Hypervisor to manage SVMs.
UV_PAGE_OUT
-----------
Encrypt and move the contents of a page from secure memory to normal
memory.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_PAGE_OUT,
uint16_t lpid, /* LPAR ID */
uint64_t dest_ra, /* real address of destination page */
uint64_t src_gpa, /* source guest-physical-address */
uint8_t flags, /* flags */
uint64_t order) /* page size order */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``dest_ra`` is invalid.
* U_P3 if the ``src_gpa`` address is invalid.
* U_P4 if any bit in the ``flags`` is unrecognized
* U_P5 if the ``order`` parameter is unsupported.
* U_FUNCTION if functionality is not supported.
* U_BUSY if page cannot be currently paged-out.
Description
~~~~~~~~~~~
Encrypt the contents of a secure-page and make it available to
Hypervisor in a normal page.
By default, the source page is unmapped from the SVM's partition-
scoped page table. But the Hypervisor can provide a hint to the
Ultravisor to retain the page mapping by setting the ``UV_SNAPSHOT``
flag in ``flags`` parameter.
If the source page is already a shared page the call returns
U_SUCCESS, without doing anything.
Use cases
~~~~~~~~~
#. QEMU attempts to access an address belonging to the SVM but the
page frame for that address is not mapped into QEMU's address
space. In this case, the Hypervisor will allocate a page frame,
map it into QEMU's address space and issue the ``UV_PAGE_OUT``
call to retrieve the encrypted contents of the page.
#. When Ultravisor runs low on secure memory and it needs to page-out
an LRU page. In this case, Ultravisor will issue the
``H_SVM_PAGE_OUT`` hypercall to the Hypervisor. The Hypervisor will
then allocate a normal page and issue the ``UV_PAGE_OUT`` ultracall
and the Ultravisor will encrypt and move the contents of the secure
page into the normal page.
#. When Hypervisor accesses SVM data, the Hypervisor requests the
Ultravisor to transfer the corresponding page into a insecure page,
which the Hypervisor can access. The data in the normal page will
be encrypted though.
UV_PAGE_IN
----------
Move the contents of a page from normal memory to secure memory.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_PAGE_IN,
uint16_t lpid, /* the LPAR ID */
uint64_t src_ra, /* source real address of page */
uint64_t dest_gpa, /* destination guest physical address */
uint64_t flags, /* flags */
uint64_t order) /* page size order */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_BUSY if page cannot be currently paged-in.
* U_FUNCTION if functionality is not supported
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``src_ra`` is invalid.
* U_P3 if the ``dest_gpa`` address is invalid.
* U_P4 if any bit in the ``flags`` is unrecognized
* U_P5 if the ``order`` parameter is unsupported.
Description
~~~~~~~~~~~
Move the contents of the page identified by ``src_ra`` from normal
memory to secure memory and map it to the guest physical address
``dest_gpa``.
If `dest_gpa` refers to a shared address, map the page into the
partition-scoped page-table of the SVM. If `dest_gpa` is not shared,
copy the contents of the page into the corresponding secure page.
Depending on the context, decrypt the page before being copied.
The caller provides the attributes of the page through the ``flags``
parameter. Valid values for ``flags`` are:
* CACHE_INHIBITED
* CACHE_ENABLED
* WRITE_PROTECTION
The Hypervisor must pin the page in memory before making
``UV_PAGE_IN`` ultracall.
Use cases
~~~~~~~~~
#. When a normal VM switches to secure mode, all its pages residing
in normal memory, are moved into secure memory.
#. When an SVM requests to share a page with Hypervisor the Hypervisor
allocates a page and informs the Ultravisor.
#. When an SVM accesses a secure page that has been paged-out,
Ultravisor invokes the Hypervisor to locate the page. After
locating the page, the Hypervisor uses UV_PAGE_IN to make the
page available to Ultravisor.
UV_PAGE_INVAL
-------------
Invalidate the Ultravisor mapping of a page.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_PAGE_INVAL,
uint16_t lpid, /* the LPAR ID */
uint64_t guest_pa, /* destination guest-physical-address */
uint64_t order) /* page size order */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``guest_pa`` is invalid (or corresponds to a secure
page mapping).
* U_P3 if the ``order`` is invalid.
* U_FUNCTION if functionality is not supported.
* U_BUSY if page cannot be currently invalidated.
Description
~~~~~~~~~~~
This ultracall informs Ultravisor that the page mapping in Hypervisor
corresponding to the given guest physical address has been invalidated
and that the Ultravisor should not access the page. If the specified
``guest_pa`` corresponds to a secure page, Ultravisor will ignore the
attempt to invalidate the page and return U_P2.
Use cases
~~~~~~~~~
#. When a shared page is unmapped from the QEMU's page table, possibly
because it is paged-out to disk, Ultravisor needs to know that the
page should not be accessed from its side too.
UV_WRITE_PATE
-------------
Validate and write the partition table entry (PATE) for a given
partition.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_WRITE_PATE,
uint32_t lpid, /* the LPAR ID */
uint64_t dw0 /* the first double word to write */
uint64_t dw1) /* the second double word to write */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_BUSY if PATE cannot be currently written to.
* U_FUNCTION if functionality is not supported.
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``dw0`` is invalid.
* U_P3 if the ``dw1`` address is invalid.
* U_PERMISSION if the Hypervisor is attempting to change the PATE
of a secure virtual machine or if called from a
context other than Hypervisor.
Description
~~~~~~~~~~~
Validate and write a LPID and its partition-table-entry for the given
LPID. If the LPID is already allocated and initialized, this call
results in changing the partition table entry.
Use cases
~~~~~~~~~
#. The Partition table resides in Secure memory and its entries,
called PATE (Partition Table Entries), point to the partition-
scoped page tables for the Hypervisor as well as each of the
virtual machines (both secure and normal). The Hypervisor
operates in partition 0 and its partition-scoped page tables
reside in normal memory.
#. This ultracall allows the Hypervisor to register the partition-
scoped and process-scoped page table entries for the Hypervisor
and other partitions (virtual machines) with the Ultravisor.
#. If the value of the PATE for an existing partition (VM) changes,
the TLB cache for the partition is flushed.
#. The Hypervisor is responsible for allocating LPID. The LPID and
its PATE entry are registered together. The Hypervisor manages
the PATE entries for a normal VM and can change the PATE entry
anytime. Ultravisor manages the PATE entries for an SVM and
Hypervisor is not allowed to modify them.
UV_RETURN
---------
Return control from the Hypervisor back to the Ultravisor after
processing an hypercall or interrupt that was forwarded (aka
*reflected*) to the Hypervisor.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_RETURN)
Return values
~~~~~~~~~~~~~
This call never returns to Hypervisor on success. It returns
U_INVALID if ultracall is not made from a Hypervisor context.
Description
~~~~~~~~~~~
When an SVM makes an hypercall or incurs some other exception, the
Ultravisor usually forwards (aka *reflects*) the exceptions to the
Hypervisor. After processing the exception, Hypervisor uses the
``UV_RETURN`` ultracall to return control back to the SVM.
The expected register state on entry to this ultracall is:
* Non-volatile registers are restored to their original values.
* If returning from an hypercall, register R0 contains the return
value (**unlike other ultracalls**) and, registers R4 through R12
contain any output values of the hypercall.
* R3 contains the ultracall number, i.e UV_RETURN.
* If returning with a synthesized interrupt, R2 contains the
synthesized interrupt number.
Use cases
~~~~~~~~~
#. Ultravisor relies on the Hypervisor to provide several services to
the SVM such as processing hypercall and other exceptions. After
processing the exception, Hypervisor uses UV_RETURN to return
control back to the Ultravisor.
#. Hypervisor has to use this ultracall to return control to the SVM.
UV_REGISTER_MEM_SLOT
--------------------
Register an SVM address-range with specified properties.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_REGISTER_MEM_SLOT,
uint64_t lpid, /* LPAR ID of the SVM */
uint64_t start_gpa, /* start guest physical address */
uint64_t size, /* size of address range in bytes */
uint64_t flags /* reserved for future expansion */
uint16_t slotid) /* slot identifier */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``start_gpa`` is invalid.
* U_P3 if ``size`` is invalid.
* U_P4 if any bit in the ``flags`` is unrecognized.
* U_P5 if the ``slotid`` parameter is unsupported.
* U_PERMISSION if called from context other than Hypervisor.
* U_FUNCTION if functionality is not supported.
Description
~~~~~~~~~~~
Register a memory range for an SVM. The memory range starts at the
guest physical address ``start_gpa`` and is ``size`` bytes long.
Use cases
~~~~~~~~~
#. When a virtual machine goes secure, all the memory slots managed by
the Hypervisor move into secure memory. The Hypervisor iterates
through each of memory slots, and registers the slot with
Ultravisor. Hypervisor may discard some slots such as those used
for firmware (SLOF).
#. When new memory is hot-plugged, a new memory slot gets registered.
UV_UNREGISTER_MEM_SLOT
----------------------
Unregister an SVM address-range that was previously registered using
UV_REGISTER_MEM_SLOT.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_UNREGISTER_MEM_SLOT,
uint64_t lpid, /* LPAR ID of the SVM */
uint64_t slotid) /* reservation slotid */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_FUNCTION if functionality is not supported.
* U_PARAMETER if ``lpid`` is invalid.
* U_P2 if ``slotid`` is invalid.
* U_PERMISSION if called from context other than Hypervisor.
Description
~~~~~~~~~~~
Release the memory slot identified by ``slotid`` and free any
resources allocated towards the reservation.
Use cases
~~~~~~~~~
#. Memory hot-remove.
UV_SVM_TERMINATE
----------------
Terminate an SVM and release its resources.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_SVM_TERMINATE,
uint64_t lpid, /* LPAR ID of the SVM */)
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_FUNCTION if functionality is not supported.
* U_PARAMETER if ``lpid`` is invalid.
* U_INVALID if VM is not secure.
* U_PERMISSION if not called from a Hypervisor context.
Description
~~~~~~~~~~~
Terminate an SVM and release all its resources.
Use cases
~~~~~~~~~
#. Called by Hypervisor when terminating an SVM.
Ultracalls used by SVM
======================
UV_SHARE_PAGE
-------------
Share a set of guest physical pages with the Hypervisor.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_SHARE_PAGE,
uint64_t gfn, /* guest page frame number */
uint64_t num) /* number of pages of size PAGE_SIZE */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_FUNCTION if functionality is not supported.
* U_INVALID if the VM is not secure.
* U_PARAMETER if ``gfn`` is invalid.
* U_P2 if ``num`` is invalid.
Description
~~~~~~~~~~~
Share the ``num`` pages starting at guest physical frame number ``gfn``
with the Hypervisor. Assume page size is PAGE_SIZE bytes. Zero the
pages before returning.
If the address is already backed by a secure page, unmap the page and
back it with an insecure page, with the help of the Hypervisor. If it
is not backed by any page yet, mark the PTE as insecure and back it
with an insecure page when the address is accessed. If it is already
backed by an insecure page, zero the page and return.
Use cases
~~~~~~~~~
#. The Hypervisor cannot access the SVM pages since they are backed by
secure pages. Hence an SVM must explicitly request Ultravisor for
pages it can share with Hypervisor.
#. Shared pages are needed to support virtio and Virtual Processor Area
(VPA) in SVMs.
UV_UNSHARE_PAGE
---------------
Restore a shared SVM page to its initial state.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_UNSHARE_PAGE,
uint64_t gfn, /* guest page frame number */
uint73 num) /* number of pages of size PAGE_SIZE*/
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_FUNCTION if functionality is not supported.
* U_INVALID if VM is not secure.
* U_PARAMETER if ``gfn`` is invalid.
* U_P2 if ``num`` is invalid.
Description
~~~~~~~~~~~
Stop sharing ``num`` pages starting at ``gfn`` with the Hypervisor.
Assume that the page size is PAGE_SIZE. Zero the pages before
returning.
If the address is already backed by an insecure page, unmap the page
and back it with a secure page. Inform the Hypervisor to release
reference to its shared page. If the address is not backed by a page
yet, mark the PTE as secure and back it with a secure page when that
address is accessed. If it is already backed by an secure page zero
the page and return.
Use cases
~~~~~~~~~
#. The SVM may decide to unshare a page from the Hypervisor.
UV_UNSHARE_ALL_PAGES
--------------------
Unshare all pages the SVM has shared with Hypervisor.
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_UNSHARE_ALL_PAGES)
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success.
* U_FUNCTION if functionality is not supported.
* U_INVAL if VM is not secure.
Description
~~~~~~~~~~~
Unshare all shared pages from the Hypervisor. All unshared pages are
zeroed on return. Only pages explicitly shared by the SVM with the
Hypervisor (using UV_SHARE_PAGE ultracall) are unshared. Ultravisor
may internally share some pages with the Hypervisor without explicit
request from the SVM. These pages will not be unshared by this
ultracall.
Use cases
~~~~~~~~~
#. This call is needed when ``kexec`` is used to boot a different
kernel. It may also be needed during SVM reset.
UV_ESM
------
Secure the virtual machine (*enter secure mode*).
Syntax
~~~~~~
.. code-block:: c
uint64_t ultracall(const uint64_t UV_ESM,
uint64_t esm_blob_addr, /* location of the ESM blob */
unint64_t fdt) /* Flattened device tree */
Return values
~~~~~~~~~~~~~
One of the following values:
* U_SUCCESS on success (including if VM is already secure).
* U_FUNCTION if functionality is not supported.
* U_INVALID if VM is not secure.
* U_PARAMETER if ``esm_blob_addr`` is invalid.
* U_P2 if ``fdt`` is invalid.
* U_PERMISSION if any integrity checks fail.
* U_RETRY insufficient memory to create SVM.
* U_NO_KEY symmetric key unavailable.
Description
~~~~~~~~~~~
Secure the virtual machine. On successful completion, return
control to the virtual machine at the address specified in the
ESM blob.
Use cases
~~~~~~~~~
#. A normal virtual machine can choose to switch to a secure mode.
Hypervisor Calls API
####################
This document describes the Hypervisor calls (hypercalls) that are
needed to support the Ultravisor. Hypercalls are services provided by
the Hypervisor to virtual machines and Ultravisor.
Register usage for these hypercalls is identical to that of the other
hypercalls defined in the Power Architecture Platform Reference (PAPR)
document. i.e on input, register R3 identifies the specific service
that is being requested and registers R4 through R11 contain
additional parameters to the hypercall, if any. On output, register
R3 contains the return value and registers R4 through R9 contain any
other output values from the hypercall.
This document only covers hypercalls currently implemented/planned
for Ultravisor usage but others can be added here when it makes sense.
The full specification for all hypercalls/ultracalls will eventually
be made available in the public/OpenPower version of the PAPR
specification.
Hypervisor calls to support Ultravisor
======================================
Following are the set of hypercalls needed to support Ultravisor.
H_SVM_INIT_START
----------------
Begin the process of converting a normal virtual machine into an SVM.
Syntax
~~~~~~
.. code-block:: c
uint64_t hypercall(const uint64_t H_SVM_INIT_START)
Return values
~~~~~~~~~~~~~
One of the following values:
* H_SUCCESS on success.
Description
~~~~~~~~~~~
Initiate the process of securing a virtual machine. This involves
coordinating with the Ultravisor, using ultracalls, to allocate
resources in the Ultravisor for the new SVM, transferring the VM's
pages from normal to secure memory etc. When the process is
completed, Ultravisor issues the H_SVM_INIT_DONE hypercall.
Use cases
~~~~~~~~~
#. Ultravisor uses this hypercall to inform Hypervisor that a VM
has initiated the process of switching to secure mode.
H_SVM_INIT_DONE
---------------
Complete the process of securing an SVM.
Syntax
~~~~~~
.. code-block:: c
uint64_t hypercall(const uint64_t H_SVM_INIT_DONE)
Return values
~~~~~~~~~~~~~
One of the following values:
* H_SUCCESS on success.
* H_UNSUPPORTED if called from the wrong context (e.g.
from an SVM or before an H_SVM_INIT_START
hypercall).
Description
~~~~~~~~~~~
Complete the process of securing a virtual machine. This call must
be made after a prior call to ``H_SVM_INIT_START`` hypercall.
Use cases
~~~~~~~~~
On successfully securing a virtual machine, the Ultravisor informs
Hypervisor about it. Hypervisor can use this call to finish setting
up its internal state for this virtual machine.
H_SVM_PAGE_IN
-------------
Move the contents of a page from normal memory to secure memory.
Syntax
~~~~~~
.. code-block:: c
uint64_t hypercall(const uint64_t H_SVM_PAGE_IN,
uint64_t guest_pa, /* guest-physical-address */
uint64_t flags, /* flags */
uint64_t order) /* page size order */
Return values
~~~~~~~~~~~~~
One of the following values:
* H_SUCCESS on success.
* H_PARAMETER if ``guest_pa`` is invalid.
* H_P2 if ``flags`` is invalid.
* H_P3 if ``order`` of page is invalid.
Description
~~~~~~~~~~~
Retrieve the content of the page, belonging to the VM at the specified
guest physical address.
Only valid value(s) in ``flags`` are:
* H_PAGE_IN_SHARED which indicates that the page is to be shared
with the Ultravisor.
* H_PAGE_IN_NONSHARED indicates that the UV is not anymore
interested in the page. Applicable if the page is a shared page.
The ``order`` parameter must correspond to the configured page size.
Use cases
~~~~~~~~~
#. When a normal VM becomes a secure VM (using the UV_ESM ultracall),
the Ultravisor uses this hypercall to move contents of each page of
the VM from normal memory to secure memory.
#. Ultravisor uses this hypercall to ask Hypervisor to provide a page
in normal memory that can be shared between the SVM and Hypervisor.
#. Ultravisor uses this hypercall to page-in a paged-out page. This
can happen when the SVM touches a paged-out page.
#. If SVM wants to disable sharing of pages with Hypervisor, it can
inform Ultravisor to do so. Ultravisor will then use this hypercall
and inform Hypervisor that it has released access to the normal
page.
H_SVM_PAGE_OUT
---------------
Move the contents of the page to normal memory.
Syntax
~~~~~~
.. code-block:: c
uint64_t hypercall(const uint64_t H_SVM_PAGE_OUT,
uint64_t guest_pa, /* guest-physical-address */
uint64_t flags, /* flags (currently none) */
uint64_t order) /* page size order */
Return values
~~~~~~~~~~~~~
One of the following values:
* H_SUCCESS on success.
* H_PARAMETER if ``guest_pa`` is invalid.
* H_P2 if ``flags`` is invalid.
* H_P3 if ``order`` is invalid.
Description
~~~~~~~~~~~
Move the contents of the page identified by ``guest_pa`` to normal
memory.
Currently ``flags`` is unused and must be set to 0. The ``order``
parameter must correspond to the configured page size.
Use cases
~~~~~~~~~
#. If Ultravisor is running low on secure pages, it can move the
contents of some secure pages, into normal pages using this
hypercall. The content will be encrypted.
References
##########
.. [1] `Supporting Protected Computing on IBM Power Architecture <https://developer.ibm.com/articles/l-support-protected-computing/>`_
......@@ -15,6 +15,7 @@
#include <asm/epapr_hcalls.h>
#include <asm/dcr.h>
#include <asm/mmu_context.h>
#include <asm/ultravisor-api.h>
#include <uapi/asm/ucontext.h>
......@@ -34,6 +35,16 @@ extern struct static_key hcall_tracepoint_key;
void __trace_hcall_entry(unsigned long opcode, unsigned long *args);
void __trace_hcall_exit(long opcode, long retval, unsigned long *retbuf);
/* Ultravisor */
#ifdef CONFIG_PPC_POWERNV
long ucall_norets(unsigned long opcode, ...);
#else
static inline long ucall_norets(unsigned long opcode, ...)
{
return U_NOT_AVAILABLE;
}
#endif
/* OPAL */
int64_t __opal_call(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
int64_t a4, int64_t a5, int64_t a6, int64_t a7,
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* PowerPC ELF notes.
*
* Copyright 2019, IBM Corporation
*/
#ifndef __ASM_POWERPC_ELFNOTE_H__
#define __ASM_POWERPC_ELFNOTE_H__
/*
* These note types should live in a SHT_NOTE segment and have
* "PowerPC" in the name field.
*/
/*
* The capabilities supported/required by this kernel (bitmap).
*
* This type uses a bitmap as "desc" field. Each bit is described
* in arch/powerpc/kernel/note.S
*/
#define PPC_ELFNOTE_CAPABILITIES 1
#endif /* __ASM_POWERPC_ELFNOTE_H__ */
......@@ -50,6 +50,7 @@
#define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000)
#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
#define FW_FEATURE_PAPR_SCM ASM_CONST(0x0000002000000000)
#define FW_FEATURE_ULTRAVISOR ASM_CONST(0x0000004000000000)
#ifndef __ASSEMBLY__
......@@ -68,9 +69,9 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
FW_FEATURE_PAPR_SCM,
FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1,
......
......@@ -48,15 +48,16 @@ struct iommu_table_ops {
* returns old TCE and DMA direction mask.
* @tce is a physical address.
*/
int (*exchange)(struct iommu_table *tbl,
int (*xchg_no_kill)(struct iommu_table *tbl,
long index,
unsigned long *hpa,
enum dma_data_direction *direction);
/* Real mode */
int (*exchange_rm)(struct iommu_table *tbl,
long index,
unsigned long *hpa,
enum dma_data_direction *direction);
enum dma_data_direction *direction,
bool realmode);
void (*tce_kill)(struct iommu_table *tbl,
unsigned long index,
unsigned long pages,
bool realmode);
__be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
#endif
......@@ -209,6 +210,12 @@ extern void iommu_del_device(struct device *dev);
extern long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction);
extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction);
extern void iommu_tce_kill(struct iommu_table *tbl,
unsigned long entry, unsigned long pages);
#else
static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
......
......@@ -283,6 +283,7 @@ struct kvm_arch {
cpumask_t cpu_in_guest;
u8 radix;
u8 fwnmi_enabled;
u8 secure_guest;
bool threads_indep;
bool nested_enable;
pgd_t *pgtable;
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Ultravisor API.
*
* Copyright 2019, IBM Corporation.
*
*/
#ifndef _ASM_POWERPC_ULTRAVISOR_API_H
#define _ASM_POWERPC_ULTRAVISOR_API_H
#include <asm/hvcall.h>
/* Return codes */
#define U_BUSY H_BUSY
#define U_FUNCTION H_FUNCTION
#define U_NOT_AVAILABLE H_NOT_AVAILABLE
#define U_P2 H_P2
#define U_P3 H_P3
#define U_P4 H_P4
#define U_P5 H_P5
#define U_PARAMETER H_PARAMETER
#define U_PERMISSION H_PERMISSION
#define U_SUCCESS H_SUCCESS
/* opcodes */
#define UV_WRITE_PATE 0xF104
#define UV_RETURN 0xF11C
#endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Ultravisor definitions
*
* Copyright 2019, IBM Corporation.
*
*/
#ifndef _ASM_POWERPC_ULTRAVISOR_H
#define _ASM_POWERPC_ULTRAVISOR_H
#include <asm/asm-prototypes.h>
#include <asm/ultravisor-api.h>
#include <asm/firmware.h>
int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
int depth, void *data);
/*
* In ultravisor enabled systems, PTCR becomes ultravisor privileged only for
* writing and an attempt to write to it will cause a Hypervisor Emulation
* Assistance interrupt.
*/
static inline void set_ptcr_when_no_uv(u64 val)
{
if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
mtspr(SPRN_PTCR, val);
}
static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1)
{
return ucall_norets(UV_WRITE_PATE, lpid, dw0, dw1);
}
#endif /* _ASM_POWERPC_ULTRAVISOR_H */
......@@ -53,7 +53,7 @@ obj-y := cputable.o ptrace.o syscalls.o \
dma-common.o
obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
signal_64.o ptrace32.o \
paca.o nvram_64.o firmware.o
paca.o nvram_64.o firmware.o note.o
obj-$(CONFIG_VDSO32) += vdso32/
obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
......@@ -156,6 +156,7 @@ endif
obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o
obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
obj-$(CONFIG_PPC_POWERNV) += ucall.o
# Disable GCOV, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_prom_init.o := n
......
......@@ -506,6 +506,7 @@ int main(void)
OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
OFFSET(KVM_RADIX, kvm, arch.radix);
OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled);
OFFSET(KVM_SECURE_GUEST, kvm, arch.secure_guest);
OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
......
......@@ -1013,29 +1013,32 @@ int iommu_tce_check_gpa(unsigned long page_shift, unsigned long gpa)
}
EXPORT_SYMBOL_GPL(iommu_tce_check_gpa);
long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl,
extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction)
{
long ret;
unsigned long size = 0;
ret = tbl->it_ops->exchange(tbl, entry, hpa, direction);
ret = tbl->it_ops->xchg_no_kill(tbl, entry, hpa, direction, false);
if (!ret && ((*direction == DMA_FROM_DEVICE) ||
(*direction == DMA_BIDIRECTIONAL)) &&
!mm_iommu_is_devmem(mm, *hpa, tbl->it_page_shift,
&size))
SetPageDirty(pfn_to_page(*hpa >> PAGE_SHIFT));
/* if (unlikely(ret))
pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n",
__func__, hwaddr, entry << tbl->it_page_shift,
hwaddr, ret); */
return ret;
}
EXPORT_SYMBOL_GPL(iommu_tce_xchg);
EXPORT_SYMBOL_GPL(iommu_tce_xchg_no_kill);
void iommu_tce_kill(struct iommu_table *tbl,
unsigned long entry, unsigned long pages)
{
if (tbl->it_ops->tce_kill)
tbl->it_ops->tce_kill(tbl, entry, pages, false);
}
EXPORT_SYMBOL_GPL(iommu_tce_kill);
int iommu_take_ownership(struct iommu_table *tbl)
{
......@@ -1049,7 +1052,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
* requires exchange() callback defined so if it is not
* implemented, we disallow taking ownership over the table.
*/
if (!tbl->it_ops->exchange)
if (!tbl->it_ops->xchg_no_kill)
return -EINVAL;
spin_lock_irqsave(&tbl->large_pool.lock, flags);
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* PowerPC ELF notes.
*
* Copyright 2019, IBM Corporation
*/
#include <linux/elfnote.h>
#include <asm/elfnote.h>
/*
* Ultravisor-capable bit (PowerNV only).
*
* Bit 0 indicates that the powerpc kernel binary knows how to run in an
* ultravisor-enabled system.
*
* In an ultravisor-enabled system, some machine resources are now controlled
* by the ultravisor. If the kernel is not ultravisor-capable, but it ends up
* being run on a machine with ultravisor, the kernel will probably crash
* trying to access ultravisor resources. For instance, it may crash in early
* boot trying to set the partition table entry 0.
*
* In an ultravisor-enabled system, a bootloader could warn the user or prevent
* the kernel from being run if the PowerPC ultravisor capability doesn't exist
* or the Ultravisor-capable bit is not set.
*/
#ifdef CONFIG_PPC_POWERNV
#define PPCCAP_ULTRAVISOR_BIT (1 << 0)
#else
#define PPCCAP_ULTRAVISOR_BIT 0
#endif
/*
* Add the PowerPC Capabilities in the binary ELF note. It is a bitmap that
* can be used to advertise kernel capabilities to userland.
*/
#define PPC_CAPABILITIES_BITMAP (PPCCAP_ULTRAVISOR_BIT)
ELFNOTE(PowerPC, PPC_ELFNOTE_CAPABILITIES,
.long PPC_CAPABILITIES_BITMAP)
......@@ -55,6 +55,7 @@
#include <asm/firmware.h>
#include <asm/dt_cpu_ftrs.h>
#include <asm/drmem.h>
#include <asm/ultravisor.h>
#include <mm/mmu_decl.h>
......@@ -702,6 +703,9 @@ void __init early_init_devtree(void *params)
#ifdef CONFIG_PPC_POWERNV
/* Some machines might need OPAL info for debugging, grab it now. */
of_scan_flat_dt(early_init_dt_scan_opal, NULL);
/* Scan tree for ultravisor feature */
of_scan_flat_dt(early_init_dt_scan_ultravisor, NULL);
#endif
#ifdef CONFIG_FA_DUMP
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Generic code to perform an ultravisor call.
*
* Copyright 2019, IBM Corporation.
*
*/
#include <asm/ppc_asm.h>
#include <asm/export.h>
_GLOBAL(ucall_norets)
EXPORT_SYMBOL_GPL(ucall_norets)
sc 2 /* Invoke the ultravisor */
blr /* Return r3 = status */
......@@ -416,7 +416,7 @@ static void kvmppc_clear_tce(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long hpa = 0;
enum dma_data_direction dir = DMA_NONE;
iommu_tce_xchg(mm, tbl, entry, &hpa, &dir);
iommu_tce_xchg_no_kill(mm, tbl, entry, &hpa, &dir);
}
static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
......@@ -447,7 +447,8 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
unsigned long hpa = 0;
long ret;
if (WARN_ON_ONCE(iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir)))
if (WARN_ON_ONCE(iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa,
&dir)))
return H_TOO_HARD;
if (dir == DMA_NONE)
......@@ -455,7 +456,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
ret = kvmppc_tce_iommu_mapped_dec(kvm, tbl, entry);
if (ret != H_SUCCESS)
iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir);
iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa, &dir);
return ret;
}
......@@ -501,7 +502,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
if (mm_iommu_mapped_inc(mem))
return H_TOO_HARD;
ret = iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir);
ret = iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa, &dir);
if (WARN_ON_ONCE(ret)) {
mm_iommu_mapped_dec(mem);
return H_TOO_HARD;
......@@ -579,6 +580,8 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
entry, ua, dir);
iommu_tce_kill(stit->tbl, entry, 1);
if (ret != H_SUCCESS) {
kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry);
goto unlock_exit;
......@@ -656,12 +659,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
*/
if (get_user(tce, tces + i)) {
ret = H_TOO_HARD;
goto unlock_exit;
goto invalidate_exit;
}
tce = be64_to_cpu(tce);
if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua))
return H_PARAMETER;
if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) {
ret = H_PARAMETER;
goto invalidate_exit;
}
list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
......@@ -671,13 +676,17 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if (ret != H_SUCCESS) {
kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl,
entry);
goto unlock_exit;
goto invalidate_exit;
}
}
kvmppc_tce_put(stt, entry + i, tce);
}
invalidate_exit:
list_for_each_entry_lockless(stit, &stt->iommu_tables, next)
iommu_tce_kill(stit->tbl, entry, npages);
unlock_exit:
srcu_read_unlock(&vcpu->kvm->srcu, idx);
......@@ -716,7 +725,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
continue;
if (ret == H_TOO_HARD)
return ret;
goto invalidate_exit;
WARN_ON_ONCE(1);
kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry);
......@@ -726,6 +735,10 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift))
kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
return H_SUCCESS;
invalidate_exit:
list_for_each_entry_lockless(stit, &stt->iommu_tables, next)
iommu_tce_kill(stit->tbl, ioba >> stt->page_shift, npages);
return ret;
}
EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
......@@ -218,13 +218,14 @@ static long kvmppc_rm_ioba_validate(struct kvmppc_spapr_tce_table *stt,
return H_SUCCESS;
}
static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
static long iommu_tce_xchg_no_kill_rm(struct mm_struct *mm,
struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction)
{
long ret;
ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
ret = tbl->it_ops->xchg_no_kill(tbl, entry, hpa, direction, true);
if (!ret && ((*direction == DMA_FROM_DEVICE) ||
(*direction == DMA_BIDIRECTIONAL))) {
......@@ -240,13 +241,20 @@ static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
return ret;
}
extern void iommu_tce_kill_rm(struct iommu_table *tbl,
unsigned long entry, unsigned long pages)
{
if (tbl->it_ops->tce_kill)
tbl->it_ops->tce_kill(tbl, entry, pages, true);
}
static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl,
unsigned long entry)
{
unsigned long hpa = 0;
enum dma_data_direction dir = DMA_NONE;
iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir);
}
static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
......@@ -278,7 +286,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
unsigned long hpa = 0;
long ret;
if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir))
if (iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir))
/*
* real mode xchg can fail if struct page crosses
* a page boundary
......@@ -290,7 +298,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
if (ret)
iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir);
return ret;
}
......@@ -336,7 +344,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
return H_TOO_HARD;
ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
ret = iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir);
if (ret) {
mm_iommu_mapped_dec(mem);
/*
......@@ -417,6 +425,8 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
stit->tbl, entry, ua, dir);
iommu_tce_kill_rm(stit->tbl, entry, 1);
if (ret != H_SUCCESS) {
kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
return ret;
......@@ -556,8 +566,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
ua = 0;
if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
return H_PARAMETER;
if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
ret = H_PARAMETER;
goto invalidate_exit;
}
list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
......@@ -567,13 +579,17 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if (ret != H_SUCCESS) {
kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl,
entry);
goto unlock_exit;
goto invalidate_exit;
}
}
kvmppc_rm_tce_put(stt, entry + i, tce);
}
invalidate_exit:
list_for_each_entry_lockless(stit, &stt->iommu_tables, next)
iommu_tce_kill_rm(stit->tbl, entry, npages);
unlock_exit:
if (rmap)
unlock_rmap(rmap);
......@@ -616,7 +632,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
continue;
if (ret == H_TOO_HARD)
return ret;
goto invalidate_exit;
WARN_ON_ONCE_RM(1);
kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
......@@ -626,7 +642,11 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift))
kvmppc_rm_tce_put(stt, ioba >> stt->page_shift, tce_value);
return H_SUCCESS;
invalidate_exit:
list_for_each_entry_lockless(stit, &stt->iommu_tables, next)
iommu_tce_kill_rm(stit->tbl, ioba >> stt->page_shift, npages);
return ret;
}
/* This can be called in either virtual mode or real mode */
......
......@@ -29,6 +29,7 @@
#include <asm/asm-compat.h>
#include <asm/feature-fixups.h>
#include <asm/cpuidle.h>
#include <asm/ultravisor-api.h>
/* Sign-extend HDEC if not on POWER9 */
#define EXTEND_HDEC(reg) \
......@@ -1085,16 +1086,10 @@ BEGIN_FTR_SECTION
END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld r5, VCPU_LR(r4)
ld r6, VCPU_CR(r4)
mtlr r5
mtcr r6
ld r1, VCPU_GPR(R1)(r4)
ld r2, VCPU_GPR(R2)(r4)
ld r3, VCPU_GPR(R3)(r4)
ld r5, VCPU_GPR(R5)(r4)
ld r6, VCPU_GPR(R6)(r4)
ld r7, VCPU_GPR(R7)(r4)
ld r8, VCPU_GPR(R8)(r4)
ld r9, VCPU_GPR(R9)(r4)
ld r10, VCPU_GPR(R10)(r4)
......@@ -1112,10 +1107,42 @@ BEGIN_FTR_SECTION
mtspr SPRN_HDSISR, r0
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
ld r6, VCPU_KVM(r4)
lbz r7, KVM_SECURE_GUEST(r6)
cmpdi r7, 0
ld r6, VCPU_GPR(R6)(r4)
ld r7, VCPU_GPR(R7)(r4)
bne ret_to_ultra
lwz r0, VCPU_CR(r4)
mtcr r0
ld r0, VCPU_GPR(R0)(r4)
ld r2, VCPU_GPR(R2)(r4)
ld r3, VCPU_GPR(R3)(r4)
ld r4, VCPU_GPR(R4)(r4)
HRFI_TO_GUEST
b .
/*
* Use UV_RETURN ultracall to return control back to the Ultravisor after
* processing an hypercall or interrupt that was forwarded (a.k.a. reflected)
* to the Hypervisor.
*
* All registers have already been loaded, except:
* R0 = hcall result
* R2 = SRR1, so UV can detect a synthesized interrupt (if any)
* R3 = UV_RETURN
*/
ret_to_ultra:
lwz r0, VCPU_CR(r4)
mtcr r0
ld r0, VCPU_GPR(R3)(r4)
mfspr r2, SPRN_SRR1
li r3, 0
ori r3, r3, UV_RETURN
ld r4, VCPU_GPR(R4)(r4)
sc 2
/*
* Enter the guest on a P9 or later system where we have exactly
......
......@@ -62,6 +62,7 @@
#include <asm/ps3.h>
#include <asm/pte-walk.h>
#include <asm/asm-prototypes.h>
#include <asm/ultravisor.h>
#include <mm/mmu_decl.h>
......@@ -1076,8 +1077,8 @@ void hash__early_init_mmu_secondary(void)
if (!cpu_has_feature(CPU_FTR_ARCH_300))
mtspr(SPRN_SDR1, _SDR1);
else
mtspr(SPRN_PTCR,
__pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
set_ptcr_when_no_uv(__pa(partition_tb) |
(PATB_SIZE_SHIFT - 12));
}
/* Initialize SLB */
slb_initialize();
......
......@@ -12,6 +12,8 @@
#include <asm/tlb.h>
#include <asm/trace.h>
#include <asm/powernv.h>
#include <asm/firmware.h>
#include <asm/ultravisor.h>
#include <mm/mmu_decl.h>
#include <trace/events/thp.h>
......@@ -205,25 +207,14 @@ void __init mmu_partition_table_init(void)
* 64 K size.
*/
ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
mtspr(SPRN_PTCR, ptcr);
set_ptcr_when_no_uv(ptcr);
powernv_set_nmmu_ptcr(ptcr);
}
void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
unsigned long dw1)
static void flush_partition(unsigned int lpid, bool radix)
{
unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
partition_tb[lpid].patb0 = cpu_to_be64(dw0);
partition_tb[lpid].patb1 = cpu_to_be64(dw1);
/*
* Global flush of TLBs and partition table caches for this lpid.
* The type of flush (hash or radix) depends on what the previous
* use of this partition ID was, not the new use.
*/
asm volatile("ptesync" : : : "memory");
if (old & PATB_HR) {
if (radix) {
asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
......@@ -237,6 +228,39 @@ void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
/* do we need fixup here ?*/
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
}
void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
unsigned long dw1)
{
unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
/*
* When ultravisor is enabled, the partition table is stored in secure
* memory and can only be accessed doing an ultravisor call. However, we
* maintain a copy of the partition table in normal memory to allow Nest
* MMU translations to occur (for normal VMs).
*
* Therefore, here we always update partition_tb, regardless of whether
* we are running under an ultravisor or not.
*/
partition_tb[lpid].patb0 = cpu_to_be64(dw0);
partition_tb[lpid].patb1 = cpu_to_be64(dw1);
/*
* If ultravisor is enabled, we do an ultravisor call to register the
* partition table entry (PATE), which also do a global flush of TLBs
* and partition table caches for the lpid. Otherwise, just do the
* flush. The type of flush (hash or radix) depends on what the previous
* use of the partition ID was, not the new use.
*/
if (firmware_has_feature(FW_FEATURE_ULTRAVISOR)) {
uv_register_pate(lpid, dw0, dw1);
pr_info("PATE registered by ultravisor: dw0 = 0x%lx, dw1 = 0x%lx\n",
dw0, dw1);
} else {
flush_partition(lpid, (old & PATB_HR));
}
}
EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
......
......@@ -27,6 +27,7 @@
#include <asm/sections.h>
#include <asm/trace.h>
#include <asm/uaccess.h>
#include <asm/ultravisor.h>
#include <trace/events/thp.h>
......@@ -650,8 +651,9 @@ void radix__early_init_mmu_secondary(void)
lpcr = mfspr(SPRN_LPCR);
mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
mtspr(SPRN_PTCR,
__pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
set_ptcr_when_no_uv(__pa(partition_tb) |
(PATB_SIZE_SHIFT - 12));
radix_init_amor();
}
......@@ -667,7 +669,7 @@ void radix__mmu_cleanup_all(void)
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
lpcr = mfspr(SPRN_LPCR);
mtspr(SPRN_LPCR, lpcr & ~LPCR_UPRT);
mtspr(SPRN_PTCR, 0);
set_ptcr_when_no_uv(0);
powernv_set_nmmu_ptcr(0);
radix__flush_tlb_all();
}
......
......@@ -4,6 +4,7 @@ obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
obj-y += ultravisor.o
obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
......
......@@ -675,6 +675,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
sprs.ptcr = mfspr(SPRN_PTCR);
sprs.rpr = mfspr(SPRN_RPR);
sprs.tscr = mfspr(SPRN_TSCR);
if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
sprs.ldbar = mfspr(SPRN_LDBAR);
sprs_saved = true;
......@@ -789,6 +790,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
mtspr(SPRN_MMCR0, sprs.mmcr0);
mtspr(SPRN_MMCR1, sprs.mmcr1);
mtspr(SPRN_MMCR2, sprs.mmcr2);
if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
mtspr(SPRN_LDBAR, sprs.ldbar);
mtspr(SPRN_SPRG3, local_paca->sprg_vdso);
......
......@@ -29,23 +29,23 @@ struct memcons {
static struct memcons *opal_memcons = NULL;
ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count)
ssize_t memcons_copy(struct memcons *mc, char *to, loff_t pos, size_t count)
{
const char *conbuf;
ssize_t ret;
size_t first_read = 0;
uint32_t out_pos, avail;
if (!opal_memcons)
if (!mc)
return -ENODEV;
out_pos = be32_to_cpu(READ_ONCE(opal_memcons->out_pos));
out_pos = be32_to_cpu(READ_ONCE(mc->out_pos));
/* Now we've read out_pos, put a barrier in before reading the new
* data it points to in conbuf. */
smp_rmb();
conbuf = phys_to_virt(be64_to_cpu(opal_memcons->obuf_phys));
conbuf = phys_to_virt(be64_to_cpu(mc->obuf_phys));
/* When the buffer has wrapped, read from the out_pos marker to the end
* of the buffer, and then read the remaining data as in the un-wrapped
......@@ -53,7 +53,7 @@ ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count)
if (out_pos & MEMCONS_OUT_POS_WRAP) {
out_pos &= MEMCONS_OUT_POS_MASK;
avail = be32_to_cpu(opal_memcons->obuf_size) - out_pos;
avail = be32_to_cpu(mc->obuf_size) - out_pos;
ret = memory_read_from_buffer(to, count, &pos,
conbuf + out_pos, avail);
......@@ -71,7 +71,7 @@ ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count)
}
/* Sanity check. The firmware should not do this to us. */
if (out_pos > be32_to_cpu(opal_memcons->obuf_size)) {
if (out_pos > be32_to_cpu(mc->obuf_size)) {
pr_err("OPAL: memory console corruption. Aborting read.\n");
return -EINVAL;
}
......@@ -86,6 +86,11 @@ ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count)
return ret;
}
ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count)
{
return memcons_copy(opal_memcons, to, pos, count);
}
static ssize_t opal_msglog_read(struct file *file, struct kobject *kobj,
struct bin_attribute *bin_attr, char *to,
loff_t pos, size_t count)
......@@ -98,32 +103,48 @@ static struct bin_attribute opal_msglog_attr = {
.read = opal_msglog_read
};
void __init opal_msglog_init(void)
struct memcons *memcons_init(struct device_node *node, const char *mc_prop_name)
{
u64 mcaddr;
struct memcons *mc;
if (of_property_read_u64(opal_node, "ibm,opal-memcons", &mcaddr)) {
pr_warn("OPAL: Property ibm,opal-memcons not found, no message log\n");
return;
if (of_property_read_u64(node, mc_prop_name, &mcaddr)) {
pr_warn("%s property not found, no message log\n",
mc_prop_name);
goto out_err;
}
mc = phys_to_virt(mcaddr);
if (!mc) {
pr_warn("OPAL: memory console address is invalid\n");
return;
pr_warn("memory console address is invalid\n");
goto out_err;
}
if (be64_to_cpu(mc->magic) != MEMCONS_MAGIC) {
pr_warn("OPAL: memory console version is invalid\n");
return;
pr_warn("memory console version is invalid\n");
goto out_err;
}
/* Report maximum size */
opal_msglog_attr.size = be32_to_cpu(mc->ibuf_size) +
be32_to_cpu(mc->obuf_size);
return mc;
out_err:
return NULL;
}
u32 memcons_get_size(struct memcons *mc)
{
return be32_to_cpu(mc->ibuf_size) + be32_to_cpu(mc->obuf_size);
}
void __init opal_msglog_init(void)
{
opal_memcons = memcons_init(opal_node, "ibm,opal-memcons");
if (!opal_memcons) {
pr_warn("OPAL: memcons failed to load from ibm,opal-memcons\n");
return;
}
opal_memcons = mc;
opal_msglog_attr.size = memcons_get_size(opal_memcons);
}
void __init opal_msglog_sysfs_init(void)
......
......@@ -1939,26 +1939,12 @@ static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index,
}
#ifdef CONFIG_IOMMU_API
static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction)
/* Common for IODA1 and IODA2 */
static int pnv_ioda_tce_xchg_no_kill(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction,
bool realmode)
{
long ret = pnv_tce_xchg(tbl, index, hpa, direction, true);
if (!ret)
pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, false);
return ret;
}
static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction)
{
long ret = pnv_tce_xchg(tbl, index, hpa, direction, false);
if (!ret)
pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, true);
return ret;
return pnv_tce_xchg(tbl, index, hpa, direction, !realmode);
}
#endif
......@@ -1973,8 +1959,8 @@ static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
static struct iommu_table_ops pnv_ioda1_iommu_ops = {
.set = pnv_ioda1_tce_build,
#ifdef CONFIG_IOMMU_API
.exchange = pnv_ioda1_tce_xchg,
.exchange_rm = pnv_ioda1_tce_xchg_rm,
.xchg_no_kill = pnv_ioda_tce_xchg_no_kill,
.tce_kill = pnv_pci_p7ioc_tce_invalidate,
.useraddrptr = pnv_tce_useraddrptr,
#endif
.clear = pnv_ioda1_tce_free,
......@@ -2103,30 +2089,6 @@ static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
return ret;
}
#ifdef CONFIG_IOMMU_API
static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction)
{
long ret = pnv_tce_xchg(tbl, index, hpa, direction, true);
if (!ret)
pnv_pci_ioda2_tce_invalidate(tbl, index, 1, false);
return ret;
}
static int pnv_ioda2_tce_xchg_rm(struct iommu_table *tbl, long index,
unsigned long *hpa, enum dma_data_direction *direction)
{
long ret = pnv_tce_xchg(tbl, index, hpa, direction, false);
if (!ret)
pnv_pci_ioda2_tce_invalidate(tbl, index, 1, true);
return ret;
}
#endif
static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
long npages)
{
......@@ -2138,8 +2100,8 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index,
static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.set = pnv_ioda2_tce_build,
#ifdef CONFIG_IOMMU_API
.exchange = pnv_ioda2_tce_xchg,
.exchange_rm = pnv_ioda2_tce_xchg_rm,
.xchg_no_kill = pnv_ioda_tce_xchg_no_kill,
.tce_kill = pnv_pci_ioda2_tce_invalidate,
.useraddrptr = pnv_tce_useraddrptr,
#endif
.clear = pnv_ioda2_tce_free,
......
......@@ -30,4 +30,9 @@ extern void opal_event_shutdown(void);
bool cpu_core_split_required(void);
struct memcons;
ssize_t memcons_copy(struct memcons *mc, char *to, loff_t pos, size_t count);
u32 memcons_get_size(struct memcons *mc);
struct memcons *memcons_init(struct device_node *node, const char *mc_prop_name);
#endif /* _POWERNV_H */
// SPDX-License-Identifier: GPL-2.0
/*
* Ultravisor high level interfaces
*
* Copyright 2019, IBM Corporation.
*
*/
#include <linux/init.h>
#include <linux/printk.h>
#include <linux/of_fdt.h>
#include <linux/of.h>
#include <asm/ultravisor.h>
#include <asm/firmware.h>
#include <asm/machdep.h>
#include "powernv.h"
static struct kobject *ultravisor_kobj;
int __init early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
int depth, void *data)
{
if (!of_flat_dt_is_compatible(node, "ibm,ultravisor"))
return 0;
powerpc_firmware_features |= FW_FEATURE_ULTRAVISOR;
pr_debug("Ultravisor detected!\n");
return 1;
}
static struct memcons *uv_memcons;
static ssize_t uv_msglog_read(struct file *file, struct kobject *kobj,
struct bin_attribute *bin_attr, char *to,
loff_t pos, size_t count)
{
return memcons_copy(uv_memcons, to, pos, count);
}
static struct bin_attribute uv_msglog_attr = {
.attr = {.name = "msglog", .mode = 0400},
.read = uv_msglog_read
};
static int __init uv_init(void)
{
struct device_node *node;
if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
return 0;
node = of_find_compatible_node(NULL, NULL, "ibm,uv-firmware");
if (!node)
return -ENODEV;
uv_memcons = memcons_init(node, "memcons");
if (!uv_memcons)
return -ENOENT;
uv_msglog_attr.size = memcons_get_size(uv_memcons);
ultravisor_kobj = kobject_create_and_add("ultravisor", firmware_kobj);
if (!ultravisor_kobj)
return -ENOMEM;
return sysfs_create_bin_file(ultravisor_kobj, &uv_msglog_attr);
}
machine_subsys_initcall(powernv, uv_init);
......@@ -621,7 +621,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
#ifdef CONFIG_IOMMU_API
static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
long *tce, enum dma_data_direction *direction)
long *tce, enum dma_data_direction *direction,
bool realmode)
{
long rc;
unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
......@@ -649,7 +650,7 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
struct iommu_table_ops iommu_table_lpar_multi_ops = {
.set = tce_buildmulti_pSeriesLP,
#ifdef CONFIG_IOMMU_API
.exchange = tce_exchange_pseries,
.xchg_no_kill = tce_exchange_pseries,
#endif
.clear = tce_freemulti_pSeriesLP,
.get = tce_get_pSeriesLP
......
......@@ -435,7 +435,7 @@ static int tce_iommu_clear(struct tce_container *container,
unsigned long oldhpa;
long ret;
enum dma_data_direction direction;
unsigned long lastentry = entry + pages;
unsigned long lastentry = entry + pages, firstentry = entry;
for ( ; entry < lastentry; ++entry) {
if (tbl->it_indirect_levels && tbl->it_userspace) {
......@@ -460,7 +460,7 @@ static int tce_iommu_clear(struct tce_container *container,
direction = DMA_NONE;
oldhpa = 0;
ret = iommu_tce_xchg(container->mm, tbl, entry, &oldhpa,
ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry, &oldhpa,
&direction);
if (ret)
continue;
......@@ -476,6 +476,8 @@ static int tce_iommu_clear(struct tce_container *container,
tce_iommu_unuse_page(container, oldhpa);
}
iommu_tce_kill(tbl, firstentry, pages);
return 0;
}
......@@ -518,8 +520,8 @@ static long tce_iommu_build(struct tce_container *container,
hpa |= offset;
dirtmp = direction;
ret = iommu_tce_xchg(container->mm, tbl, entry + i, &hpa,
&dirtmp);
ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry + i,
&hpa, &dirtmp);
if (ret) {
tce_iommu_unuse_page(container, hpa);
pr_err("iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%ld\n",
......@@ -536,6 +538,8 @@ static long tce_iommu_build(struct tce_container *container,
if (ret)
tce_iommu_clear(container, tbl, entry, i);
else
iommu_tce_kill(tbl, entry, pages);
return ret;
}
......@@ -572,8 +576,8 @@ static long tce_iommu_build_v2(struct tce_container *container,
if (mm_iommu_mapped_inc(mem))
break;
ret = iommu_tce_xchg(container->mm, tbl, entry + i, &hpa,
&dirtmp);
ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry + i,
&hpa, &dirtmp);
if (ret) {
/* dirtmp cannot be DMA_NONE here */
tce_iommu_unuse_page_v2(container, tbl, entry + i);
......@@ -593,6 +597,8 @@ static long tce_iommu_build_v2(struct tce_container *container,
if (ret)
tce_iommu_clear(container, tbl, entry, i);
else
iommu_tce_kill(tbl, entry, pages);
return ret;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment