Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GIPC: Introduce G-stage page table In Process Context (GIPC) capability #413

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 110 additions & 15 deletions src/iommu_data_structures.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ then only the first-stage suffices to perform necessary address translations and
protections; the second-stage scheme may be effectively disabled for the device by
programming the second-stage address translation scheme to be `Bare`.

When second-stage address translation is not Bare, the `DC` holds the PPN of the
When second-stage address translation is not Bare, the `DC` or `PC` holds the PPN of the
root second-stage page table; a guest-soft-context-ID (`GSCID`), which
facilitates invalidation of cached address translations on a per-virtual-machine
basis; and the second-stage address translation scheme.
Expand All @@ -40,6 +40,9 @@ a data structure called the Process Context (`PC`).
When a PDT is active, the controls for first-stage address translation are held
in the (`PC`).

When a PDT is active with `DC.tc.GIPC = 1`, the controls for first-stage
and second-stage address translation are held in the (`PC`).

When a PDT is not active, the controls for first-stage address translation are
held in the `DC` itself.

Expand Down Expand Up @@ -104,12 +107,21 @@ traverse the DDT radix-tree are as follows:
], config:{lanes: 1, hspace:1024, fontsize: 16}}
....

Three formats of the process-context structure are supported:
* *Base Format* - A 16-byte PC used when `DC.tc.GIPC = 0` and `capabilities.MSI_FLAT = 0`

* *Extended Format* - In the extended format a 32-byte process context is used
when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 0`.

* *Extended Format with MSI page table* - In the extended format with MSI page table a
64-byte process context is used when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 1`.

The PDT may be configured to be a 1, 2, or 3 level radix-tree depending on the
maximum width of the `process_id` supported by that device. The partitioning
of the `process_id` to obtain the process directory indices (PDI) to traverse
the PDT radix-tree are as follows:

.`process_id` partitioning for PDT radix-tree traversal
.Base format `process_id` partitioning for PDT radix-tree traversal

[wavedrom, , ]
....
Expand All @@ -119,6 +131,29 @@ the PDT radix-tree are as follows:
{bits: 3, name: 'PDI[2]'},
], config:{lanes: 1, hspace:1024, fontsize: 16}}
....

.Extended format `process_id` partitioning for PDT radix-tree traversal

[wavedrom, , ]
....
{reg: [
{bits: 7, name: 'PDI[0]'},
{bits: 9, name: 'PDI[1]'},
{bits: 4, name: 'PDI[2]'},
], config:{lanes: 1, hspace:1024, fontsize: 16}}
....

.Extended format with MSI page table `process_id` partitioning for PDT radix-tree traversal

[wavedrom, , ]
....
{reg: [
{bits: 6, name: 'PDI[0]'},
{bits: 9, name: 'PDI[1]'},
{bits: 5, name: 'PDI[2]'},
], config:{lanes: 1, hspace:1024, fontsize: 16}}
....

[NOTE]
====
The `process_id` partitioning is designed to require a maximum of 4 KiB, a
Expand Down Expand Up @@ -269,7 +304,8 @@ order.
{bits: 1, name: 'DPE'},
{bits: 1, name: 'SBE'},
{bits: 1, name: 'SXL'},
{bits: 12, name: 'reserved'},
{bits: 1, name: 'GIPC'},
{bits: 11, name: 'reserved'},
{bits: 8, name: 'custom'},
{bits: 32, name: 'reserved'},
], config:{lanes: 4, hspace: 1024, fontsize: 16, fontsize: 16}}
Expand Down Expand Up @@ -379,6 +415,10 @@ default value of `process_id` for translating requests without a valid
`process_id`. When `PDTV` is 0, the `DPE` bit is reserved for future standard
extension.

The `GIPC` bit is expected to be set to 1 when `PC` is associated with iohgatp
and msiptp instead of `DC.iohgatp` and `DC.msiptp`. The `DC.tc.GIPC` depends on
`capabilities.GIPC`.

The IOMMU supports the 1 setting of `GADE` and `SADE` bits if
`capabilities.AMO_HWAD` is 1. When `capabilities.AMO_HWAD` is 0, these bits are
reserved.
Expand Down Expand Up @@ -421,6 +461,8 @@ When `SXL` is 1, the following rules apply:
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....

When `DC.tc.PDTV` is set and `DC.tc.GIPC = 1`, the `DC.iohgatp` field is
ignored, and `PC.iohgatp` is used instead. Otherwise, `DC.iohgatp` is used.
The `iohgatp` field holds the PPN of the root second-stage page table and a
virtual machine identified by a guest soft-context ID (`GSCID`), to facilitate
address-translation fences on a per-virtual-machine basis. If multiple devices
Expand Down Expand Up @@ -562,14 +604,20 @@ determines the number of levels of the PDT.
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....

When second-stage address translation is not Bare, the `pdtp.PPN` field holds a
When second-stage address translation is not Bare and `DC.tc.GIPC = 0`, the `pdtp.PPN` field holds a
guest PPN. The GPA of the root PDT is then converted by guest physical address
translation process, as controlled by the `iohgatp`, into a supervisor physical
address. Translating addresses of PDT using a second-stage page table, allows the
PDT to be held in memory allocated by the guest OS and allows the guest OS to
directly edit the PDT to associate a virtual-address space identified by a
first-stage page table with a `process_id`.

When second-stage address translation is not Bare and `DC.tc.GIPC = 1`,
the `PPN` field holds a supervisor PPN. The supervisor physical address of PDT root
page, allows the PDT to be configured into the supervisor physical address space to
allow the guest OS to use virtio-IOMMU API edit the PDT in hypervisor to associate a
virtual-address space identified by a VS-stage page table with a `process_id`.

[[PDTP_MODE_ENC]]
.Encoding of `pdtp.MODE` field
[width=75%]
Expand Down Expand Up @@ -775,9 +823,9 @@ A valid (`V==1`) non-leaf PDT entry holds the PPN of the next-level PDT.

==== Leaf PDT entry
The leaf PDT page is indexed by `PDI[0]` and holds the 16-byte process-context
(`PC`).
(`PC`) when `DC.tc.GIPC = 0` and `capabilities.MSI_FLAT = 0`.

.Process-context
.Base-format process-context

[wavedrom, , ]
....
Expand All @@ -787,7 +835,41 @@ The leaf PDT page is indexed by `PDI[0]` and holds the 16-byte process-context
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....

The `PC` is interpreted as two 64-bit doublewords. The byte order of each of the
The leaf PDT page is indexed by `PDI[0]` and holds the 32-byte process-context
(`PC`) when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 0`.

.Extended-format process-context

[wavedrom, , ]
....
{reg: [
{bits: 64, name: 'reserved'},
{bits: 64, name: 'IO Hyp. guest addr. translation and prot. (iohgatp)'},
{bits: 64, name: 'Translation-attributes (ta)'},
{bits: 64, name: 'First-stage-context (fsc)'},
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....

The leaf PDT page is indexed by `PDI[0]` and holds the 64-byte process-context
(`PC`) when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 1`.

.Extended-format process-context with MSI-page-table

[wavedrom, , ]
....
{reg: [
{bits: 64, name: 'reserved'},
{bits: 64, name: 'MSI-page-table pointer (msiptp)'},
{bits: 64, name: 'MSI-address-mask (msi_addr_mask)'},
{bits: 64, name: 'MSI-address-pattern (msi_addr_pattern)'},
{bits: 64, name: 'reserved'},
{bits: 64, name: 'IO Hyp. guest addr. translation and prot. (iohgatp)'},
{bits: 64, name: 'Translation-attributes (ta)'},
{bits: 64, name: 'First-stage-context (fsc)'},
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....

The `PC` is interpreted as multi 64-bit doublewords. The byte order of each of the
doublewords in memory, little-endian or big-endian, is the endianness as
determined by `DC.tc.SBE`. The IOMMU may read the `PC` fields in any order.

Expand Down Expand Up @@ -849,14 +931,18 @@ field controls the supported paged virtual-memory schemes. When `PC.fsc.MODE` is
not `Bare`, the `PC.fsc.PPN` field holds the PPN of the root page of a
first-stage page table.

When second-stage address translation is not Bare, the `PC.fsc.PPN` field holds
When second-stage address translation is not Bare and `DC.tc.GIPC = 0`, the `PC.fsc.PPN` field holds
a guest PPN of the root of a first-stage page table. Addresses of the first-stage
page table entries are then converted by guest physical address translation
process, as controlled by the `DC.iohgatp`, into a supervisor physical address.
A guest OS may thus directly edit the first-stage page table to limit access by
the device to a subset of its memory and specify permissions for the device
accesses.

When second-stage address translation is not Bare and `DC.tc.GIPC = 1`, the `PC.fsc.PPN` field holds
a supervisor PPN of the root of a first-stage page table. A guest OS may edit
the first-stage page table with the help of hypervisor.

[NOTE]
====
The `PC.ta.PSCID` identifies an address space. If an identical
Expand All @@ -866,6 +952,15 @@ the first page table or the second page table. These are the only expected
behaviors.
====

===== IO hypervisor guest address translation and protection (`iohgatp`)
The same as `DC.iohgatp`.

===== MSI page table pointer (`msiptp`)
The same as `DC.msiptp`.

===== MSI address mask (`msi_addr_mask`) and pattern (`msi_addr_pattern`)
The same as `DC.msi_addr_mask` and `DC.msi_addr_pattern`.

[[PC_MISCONFIG]]
==== Process-context configuration checks

Expand Down Expand Up @@ -1039,9 +1134,9 @@ is as follows:
==== Process to locate the Process-context

The device-context provides the PDT root page PPN (`pdtp.ppn`). When
`DC.iohgatp.mode` is not `Bare`, `pdtp.PPN` as well as `pdte.PPN` are Guest
`DC/PC.iohgatp.mode` is not `Bare`, `pdtp.PPN` as well as `pdte.PPN` are Guest
Physical Addresses (GPA) which must be translated into Supervisor Physical
Addresses (SPA) using the second-stage page table pointed to by `DC.iohgatp`.
Addresses (SPA) using the second-stage page table pointed to by `DC/PC.iohgatp`.
The memory accesses to the PDT are treated as implicit read memory accesses
by the second-stage.

Expand All @@ -1051,7 +1146,7 @@ The process to locate the Process-context for a transaction using its
. Let `a` be `pdtp.PPN x 2^12^` and let `i = LEVELS - 1`. When
`pdtp.MODE` is `PD20`, `LEVELS` is three. When `pdtp.MODE` is
`PD17`, `LEVELS` is two. When `pdtp.MODE` is `PD8`, `LEVELS` is one.
. If `DC.iohgatp.mode != Bare`, then `a` is a GPA. Invoke the process
. If `DC/PC.iohgatp.mode != Bare`, then `a` is a GPA. Invoke the process
to translate `a` to a SPA as an implicit memory access. If faults occur during
second-stage address translation of `a` then stop and report the fault detected
by the second-stage address translation process. The translated `a` is used in
Expand All @@ -1066,7 +1161,7 @@ The process to locate the Process-context for a transaction using its
. If any bits or encoding that are reserved for future standard use are
set within `pdte`, stop and report "PDT entry misconfigured" (cause = 267).
. Let `i = i - 1` and let `a = pdte.PPN x 2^12^`. Go to step 2.
. Let `PC` be the value of the 16-bytes at address `a + PDI[0] x 16`. If accessing `PC`
. Let `PC` be the value of the 16/32/64-bytes at address `a + PDI[0] x 16/32/64`. If accessing `PC`
violates a PMA or PMP check, then stop and report "PDT entry load access
fault" (cause = 265). If `PC` access detects a data corruption
(a.k.a. poisoned data), then stop and report "PDT data corruption"
Expand Down Expand Up @@ -1120,7 +1215,7 @@ file and translating the address using the MSI page table is as follows:
** `y = 1 0 1 0 0 1 1 0`
** then the value of `extract(x, y)` has bits `0 0 0 0 a c f g`.

. Let `m` be `(DC.msiptp.PPN x 2^12^)`.
. Let `m` be `(DC/PC.msiptp.PPN x 2^12^)`.
. Let `msipte` be the value of sixteen bytes at address `(m | (I x 16))`. If
accessing `msipte` violates a PMA or PMP check, then stop and report
"MSI PTE load access fault" (cause = 261).
Expand Down Expand Up @@ -1169,8 +1264,8 @@ operations may be performed by the I/O bridge.

When `capabilities.AMO_HWAD` is 1, the IOMMU supports updating the A and D bits in
PTEs atomically. When updating of A and D bits in second-stage PTEs is enabled
(`DC.tc.GADE=1`) and/or updating of A and D bits in first-stage PTEs is enabled
(`DC.tc.SADE=1`) the following rules apply:
(`DC/PC.tc.GADE=1`) and/or updating of A and D bits in first-stage PTEs is enabled
(`DC/PC.tc.SADE=1`) the following rules apply:

. The A and/or D bit updates by the IOMMU must follow the rules specified by the
Privileged specification for validity, permission checking, and atomicity.
Expand Down
7 changes: 4 additions & 3 deletions src/iommu_intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ accesses. The IOMMU may employ similar address translation caches, referred as
IOMMU Address Translation Cache (IOATC). The IOMMU provides mechanisms for
software to synchronize the IOATC with the memory resident data structures used
for address translation when they are modified. Software may configure the
device context with a software defined context identifier called guest
device/process context with a software defined context identifier called guest
soft-context identifier (`GSCID`) to indicate that a collection of devices are
assigned to the same VM and thus access a common virtual address space.
Software may configure the process context with a software defined context
Expand Down Expand Up @@ -139,10 +139,10 @@ would naturally be subject to the same address translation that an IOMMU
applies to other memory writes. However, the RISC-V Advanced Interrupt
Architecture cite:[AIA] requires that IOMMUs treat MSIs directed to virtual
machines specially, in part to simplify software, and in part to allow optional
support for memory-resident interrupt files. The device context is configured by
support for memory-resident interrupt files. The device/process context is configured by
software with parameters to identify memory accesses to a virtual interrupt file
and to be translated using a MSI address translation table configured by software
in the device context.
in the device/process context.

=== Glossary
.Terms and definitions
Expand Down Expand Up @@ -170,6 +170,7 @@ in the device context.
| DMA | Direct Memory Access.
| GPA | Guest Physical Address: An address in the virtualized
physical memory space of a virtual machine.
| GIPC | G-stage page table In Process Context.
| GSCID | Guest soft-context identifier: An identification number used
by software to uniquely identify a collection of devices
assigned to a virtual machine. An IOMMU may tag IOATC
Expand Down
6 changes: 4 additions & 2 deletions src/iommu_registers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features.
{bits: 1, name: 'PD8'},
{bits: 1, name: 'PD17'},
{bits: 1, name: 'PD20'},
{bits: 15, name: 'reserved'},
{bits: 1, name: 'GIPC'},
{bits: 14, name: 'reserved'},
{bits: 8, name: 'custom'},
], config:{lanes: 8, hspace:1024}}
....
Expand Down Expand Up @@ -221,7 +222,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features.
|38 |`PD8` |RO | One level PDT with 8-bit process_id supported.
|39 |`PD17` |RO | Two level PDT with 17-bit process_id supported.
|40 |`PD20` |RO | Three level PDT with 20-bit process_id supported.
|55:41 | reserved |RO | Reserved for standard use
|41 |`GIPC` |RO | G-stage page table In Process Context supported.
|55:42 | reserved |RO | Reserved for standard use
|63:56 |_custom_ |RO | _Designated for custom use_
|===

Expand Down
23 changes: 12 additions & 11 deletions src/iommu_sw_guidelines.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,9 @@ device with `device_id = D`) then the following invalidations must be performed:
* `IODIR.INVAL_DDT` with `DV=1` and `DID=D`
* If `DC.tc.PDTV==1` then `IODIR.INVAL_PDT` with `DV=1`, `PV=0`, and `DID=D`

* If `DC.iohgatp.MODE != Bare`
** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC.iohgatp.GSCID`
** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID`
* If `DC/PC.iohgatp.MODE != Bare`
** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC/PC.iohgatp.GSCID`
** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC/PC.iohgatp.GSCID`
* else
** If `DC.tc.PDTV==1 || DC.tc.PDTV == 0 && DC.fsc.MODE == Bare`
*** `IOTINVAL.VMA` with `GV=AV=PSCV=0`
Expand All @@ -171,8 +171,8 @@ If software changes a leaf-level PDT entry (i.e, a process context (`PC`), for
performed:

* `IODIR.INVAL_PDT` with `DV=1`, `PV=1`, `DID=D` and `PID=P`
* If `DC.iohgatp.MODE != Bare`
** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC.iohgatp.GSCID`,
* If `DC/PC.iohgatp.MODE != Bare`
** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC/PC.iohgatp.GSCID`,
and `PSCID=PC.PSCID`
* else
** `IOTINVAL.VMA` with `GV=0`, `AV=0`, `PV=1`, and `PSCID=PC.PSCID`
Expand All @@ -188,12 +188,12 @@ number `I` that corresponds to an untranslated MSI address `A` then the followin
invalidations must be performed:

* `IOTINVAL.GVMA` with `GV=AV=1`, `ADDR[63:12]=A[63:12]` and
`GSCID=DC.iohgatp.GSCID`
`GSCID=DC/PC.iohgatp.GSCID`

To invalidate all cache entries from a MSI page table the following
invalidations must be performed:

* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID`
* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC/PC.iohgatp.GSCID`

Between a change to the MSI PTE and when an invalidation command to invalidate
the cached PTE is processed by the IOMMU, the IOMMU may use the old PTE value
Expand All @@ -207,19 +207,20 @@ If software changes a leaf second-stage page-table entry of a VM where the chang
affects translation for a guest-PPN `G` then the following invalidations must be
performed:

* `IOTINVAL.GVMA` with `GV=AV=1`, `GSCID=DC.iohgatp.GSCID`, and `ADDR[63:12]=G`
* `IOTINVAL.GVMA` with `GV=AV=1`, `GSCID=DC/PC.iohgatp.GSCID`, and `ADDR[63:12]=G`

If software changes a non-leaf second-stage page-table entry of a VM
then the following invalidations must be performed:

* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, `GSCID=DC.iohgatp.GSCID`
* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, `GSCID=DC/PC.iohgatp.GSCID`

The `DC` has fields that hold a guest-PPN. An implementation may translate such
The `DC` has fields that hold a guest-PPN when `DC.tc.GIPC = 0`. An implementation may translate such
fields to a supervisor-PPN as part of caching the `DC`. If the second-stage page
table update affects translation of guest-PPN held in the `DC` then software
must invalidate all such cached `DC` using `IODIR.INVAL_DDT` with `DV=1` and
`DID` set to the corresponding `device_id`. Alternatively, an
`IODIR.INVAL_DDT` with `DV=0` may be used to invalidate all cached `DC`.
The `DC` hasn't fields that hold a guest-PPN when `DC.tc.GIPC = 1`.

Between a change to the second-stage PTE and when an invalidation command to
invalidate the cached PTE is processed by the IOMMU, the IOMMU may use the
Expand All @@ -238,7 +239,7 @@ specified in <<IVMA>>.

When a change is made to a first-stage page table, and the second-stage is
not Bare, then software must perform invalidations using `IOTINVAL.VMA` with
`GV=1`, `GSCID=DC.iohgatp.GSCID` and `AV` and `PSCV` operands appropriate for
`GV=1`, `GSCID=DC/PC.iohgatp.GSCID` and `AV` and `PSCV` operands appropriate for
the modification as specified in <<IVMA>>.

Between a change to the first-stage PTE and when an invalidation command to
Expand Down