diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index b54bb8a1..6b2f3deb 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -19,7 +19,7 @@ then only the first-stage suffices to perform necessary address translations and protections; the second-stage scheme may be effectively disabled for the device by programming the second-stage address translation scheme to be `Bare`. -When second-stage address translation is not Bare, the `DC` holds the PPN of the +When second-stage address translation is not Bare, the `DC` or `PC` holds the PPN of the root second-stage page table; a guest-soft-context-ID (`GSCID`), which facilitates invalidation of cached address translations on a per-virtual-machine basis; and the second-stage address translation scheme. @@ -40,6 +40,9 @@ a data structure called the Process Context (`PC`). When a PDT is active, the controls for first-stage address translation are held in the (`PC`). +When a PDT is active with `DC.tc.GIPC = 1`, the controls for first-stage +and second-stage address translation are held in the (`PC`). + When a PDT is not active, the controls for first-stage address translation are held in the `DC` itself. @@ -104,12 +107,21 @@ traverse the DDT radix-tree are as follows: ], config:{lanes: 1, hspace:1024, fontsize: 16}} .... +Three formats of the process-context structure are supported: +* *Base Format* - A 16-byte PC used when `DC.tc.GIPC = 0` and `capabilities.MSI_FLAT = 0` + +* *Extended Format* - In the extended format a 32-byte process context is used + when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 0`. + +* *Extended Format with MSI page table* - In the extended format with MSI page table a + 64-byte process context is used when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 1`. + The PDT may be configured to be a 1, 2, or 3 level radix-tree depending on the maximum width of the `process_id` supported by that device. The partitioning of the `process_id` to obtain the process directory indices (PDI) to traverse the PDT radix-tree are as follows: -.`process_id` partitioning for PDT radix-tree traversal +.Base format `process_id` partitioning for PDT radix-tree traversal [wavedrom, , ] .... @@ -119,6 +131,29 @@ the PDT radix-tree are as follows: {bits: 3, name: 'PDI[2]'}, ], config:{lanes: 1, hspace:1024, fontsize: 16}} .... + +.Extended format `process_id` partitioning for PDT radix-tree traversal + +[wavedrom, , ] +.... +{reg: [ + {bits: 7, name: 'PDI[0]'}, + {bits: 9, name: 'PDI[1]'}, + {bits: 4, name: 'PDI[2]'}, +], config:{lanes: 1, hspace:1024, fontsize: 16}} +.... + +.Extended format with MSI page table `process_id` partitioning for PDT radix-tree traversal + +[wavedrom, , ] +.... +{reg: [ + {bits: 6, name: 'PDI[0]'}, + {bits: 9, name: 'PDI[1]'}, + {bits: 5, name: 'PDI[2]'}, +], config:{lanes: 1, hspace:1024, fontsize: 16}} +.... + [NOTE] ==== The `process_id` partitioning is designed to require a maximum of 4 KiB, a @@ -269,7 +304,8 @@ order. {bits: 1, name: 'DPE'}, {bits: 1, name: 'SBE'}, {bits: 1, name: 'SXL'}, - {bits: 12, name: 'reserved'}, + {bits: 1, name: 'GIPC'}, + {bits: 11, name: 'reserved'}, {bits: 8, name: 'custom'}, {bits: 32, name: 'reserved'}, ], config:{lanes: 4, hspace: 1024, fontsize: 16, fontsize: 16}} @@ -379,6 +415,10 @@ default value of `process_id` for translating requests without a valid `process_id`. When `PDTV` is 0, the `DPE` bit is reserved for future standard extension. +The `GIPC` bit is expected to be set to 1 when `PC` is associated with iohgatp +and msiptp instead of `DC.iohgatp` and `DC.msiptp`. The `DC.tc.GIPC` depends on +`capabilities.GIPC`. + The IOMMU supports the 1 setting of `GADE` and `SADE` bits if `capabilities.AMO_HWAD` is 1. When `capabilities.AMO_HWAD` is 0, these bits are reserved. @@ -421,6 +461,8 @@ When `SXL` is 1, the following rules apply: ], config:{lanes: 2, hspace: 1024, fontsize: 16}} .... +When `DC.tc.PDTV` is set and `DC.tc.GIPC = 1`, the `DC.iohgatp` field is +ignored, and `PC.iohgatp` is used instead. Otherwise, `DC.iohgatp` is used. The `iohgatp` field holds the PPN of the root second-stage page table and a virtual machine identified by a guest soft-context ID (`GSCID`), to facilitate address-translation fences on a per-virtual-machine basis. If multiple devices @@ -562,7 +604,7 @@ determines the number of levels of the PDT. ], config:{lanes: 2, hspace: 1024, fontsize: 16}} .... -When second-stage address translation is not Bare, the `pdtp.PPN` field holds a +When second-stage address translation is not Bare and `DC.tc.GIPC = 0`, the `pdtp.PPN` field holds a guest PPN. The GPA of the root PDT is then converted by guest physical address translation process, as controlled by the `iohgatp`, into a supervisor physical address. Translating addresses of PDT using a second-stage page table, allows the @@ -570,6 +612,12 @@ PDT to be held in memory allocated by the guest OS and allows the guest OS to directly edit the PDT to associate a virtual-address space identified by a first-stage page table with a `process_id`. +When second-stage address translation is not Bare and `DC.tc.GIPC = 1`, +the `PPN` field holds a supervisor PPN. The supervisor physical address of PDT root +page, allows the PDT to be configured into the supervisor physical address space to +allow the guest OS to use virtio-IOMMU API edit the PDT in hypervisor to associate a +virtual-address space identified by a VS-stage page table with a `process_id`. + [[PDTP_MODE_ENC]] .Encoding of `pdtp.MODE` field [width=75%] @@ -775,9 +823,9 @@ A valid (`V==1`) non-leaf PDT entry holds the PPN of the next-level PDT. ==== Leaf PDT entry The leaf PDT page is indexed by `PDI[0]` and holds the 16-byte process-context -(`PC`). +(`PC`) when `DC.tc.GIPC = 0` and `capabilities.MSI_FLAT = 0`. -.Process-context +.Base-format process-context [wavedrom, , ] .... @@ -787,7 +835,41 @@ The leaf PDT page is indexed by `PDI[0]` and holds the 16-byte process-context ], config:{lanes: 2, hspace: 1024, fontsize: 16}} .... -The `PC` is interpreted as two 64-bit doublewords. The byte order of each of the +The leaf PDT page is indexed by `PDI[0]` and holds the 32-byte process-context +(`PC`) when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 0`. + +.Extended-format process-context + +[wavedrom, , ] +.... +{reg: [ + {bits: 64, name: 'reserved'}, + {bits: 64, name: 'IO Hyp. guest addr. translation and prot. (iohgatp)'}, + {bits: 64, name: 'Translation-attributes (ta)'}, + {bits: 64, name: 'First-stage-context (fsc)'}, +], config:{lanes: 2, hspace: 1024, fontsize: 16}} +.... + +The leaf PDT page is indexed by `PDI[0]` and holds the 64-byte process-context +(`PC`) when `DC.tc.GIPC = 1` and `capabilities.MSI_FLAT = 1`. + +.Extended-format process-context with MSI-page-table + +[wavedrom, , ] +.... +{reg: [ + {bits: 64, name: 'reserved'}, + {bits: 64, name: 'MSI-page-table pointer (msiptp)'}, + {bits: 64, name: 'MSI-address-mask (msi_addr_mask)'}, + {bits: 64, name: 'MSI-address-pattern (msi_addr_pattern)'}, + {bits: 64, name: 'reserved'}, + {bits: 64, name: 'IO Hyp. guest addr. translation and prot. (iohgatp)'}, + {bits: 64, name: 'Translation-attributes (ta)'}, + {bits: 64, name: 'First-stage-context (fsc)'}, +], config:{lanes: 2, hspace: 1024, fontsize: 16}} +.... + +The `PC` is interpreted as multi 64-bit doublewords. The byte order of each of the doublewords in memory, little-endian or big-endian, is the endianness as determined by `DC.tc.SBE`. The IOMMU may read the `PC` fields in any order. @@ -849,7 +931,7 @@ field controls the supported paged virtual-memory schemes. When `PC.fsc.MODE` is not `Bare`, the `PC.fsc.PPN` field holds the PPN of the root page of a first-stage page table. -When second-stage address translation is not Bare, the `PC.fsc.PPN` field holds +When second-stage address translation is not Bare and `DC.tc.GIPC = 0`, the `PC.fsc.PPN` field holds a guest PPN of the root of a first-stage page table. Addresses of the first-stage page table entries are then converted by guest physical address translation process, as controlled by the `DC.iohgatp`, into a supervisor physical address. @@ -857,6 +939,10 @@ A guest OS may thus directly edit the first-stage page table to limit access by the device to a subset of its memory and specify permissions for the device accesses. +When second-stage address translation is not Bare and `DC.tc.GIPC = 1`, the `PC.fsc.PPN` field holds +a supervisor PPN of the root of a first-stage page table. A guest OS may edit +the first-stage page table with the help of hypervisor. + [NOTE] ==== The `PC.ta.PSCID` identifies an address space. If an identical @@ -866,6 +952,15 @@ the first page table or the second page table. These are the only expected behaviors. ==== +===== IO hypervisor guest address translation and protection (`iohgatp`) +The same as `DC.iohgatp`. + +===== MSI page table pointer (`msiptp`) +The same as `DC.msiptp`. + +===== MSI address mask (`msi_addr_mask`) and pattern (`msi_addr_pattern`) +The same as `DC.msi_addr_mask` and `DC.msi_addr_pattern`. + [[PC_MISCONFIG]] ==== Process-context configuration checks @@ -1039,9 +1134,9 @@ is as follows: ==== Process to locate the Process-context The device-context provides the PDT root page PPN (`pdtp.ppn`). When -`DC.iohgatp.mode` is not `Bare`, `pdtp.PPN` as well as `pdte.PPN` are Guest +`DC/PC.iohgatp.mode` is not `Bare`, `pdtp.PPN` as well as `pdte.PPN` are Guest Physical Addresses (GPA) which must be translated into Supervisor Physical -Addresses (SPA) using the second-stage page table pointed to by `DC.iohgatp`. +Addresses (SPA) using the second-stage page table pointed to by `DC/PC.iohgatp`. The memory accesses to the PDT are treated as implicit read memory accesses by the second-stage. @@ -1051,7 +1146,7 @@ The process to locate the Process-context for a transaction using its . Let `a` be `pdtp.PPN x 2^12^` and let `i = LEVELS - 1`. When `pdtp.MODE` is `PD20`, `LEVELS` is three. When `pdtp.MODE` is `PD17`, `LEVELS` is two. When `pdtp.MODE` is `PD8`, `LEVELS` is one. -. If `DC.iohgatp.mode != Bare`, then `a` is a GPA. Invoke the process +. If `DC/PC.iohgatp.mode != Bare`, then `a` is a GPA. Invoke the process to translate `a` to a SPA as an implicit memory access. If faults occur during second-stage address translation of `a` then stop and report the fault detected by the second-stage address translation process. The translated `a` is used in @@ -1066,7 +1161,7 @@ The process to locate the Process-context for a transaction using its . If any bits or encoding that are reserved for future standard use are set within `pdte`, stop and report "PDT entry misconfigured" (cause = 267). . Let `i = i - 1` and let `a = pdte.PPN x 2^12^`. Go to step 2. -. Let `PC` be the value of the 16-bytes at address `a + PDI[0] x 16`. If accessing `PC` +. Let `PC` be the value of the 16/32/64-bytes at address `a + PDI[0] x 16/32/64`. If accessing `PC` violates a PMA or PMP check, then stop and report "PDT entry load access fault" (cause = 265). If `PC` access detects a data corruption (a.k.a. poisoned data), then stop and report "PDT data corruption" @@ -1120,7 +1215,7 @@ file and translating the address using the MSI page table is as follows: ** `y = 1 0 1 0 0 1 1 0` ** then the value of `extract(x, y)` has bits `0 0 0 0 a c f g`. -. Let `m` be `(DC.msiptp.PPN x 2^12^)`. +. Let `m` be `(DC/PC.msiptp.PPN x 2^12^)`. . Let `msipte` be the value of sixteen bytes at address `(m | (I x 16))`. If accessing `msipte` violates a PMA or PMP check, then stop and report "MSI PTE load access fault" (cause = 261). @@ -1169,8 +1264,8 @@ operations may be performed by the I/O bridge. When `capabilities.AMO_HWAD` is 1, the IOMMU supports updating the A and D bits in PTEs atomically. When updating of A and D bits in second-stage PTEs is enabled -(`DC.tc.GADE=1`) and/or updating of A and D bits in first-stage PTEs is enabled -(`DC.tc.SADE=1`) the following rules apply: +(`DC/PC.tc.GADE=1`) and/or updating of A and D bits in first-stage PTEs is enabled +(`DC/PC.tc.SADE=1`) the following rules apply: . The A and/or D bit updates by the IOMMU must follow the rules specified by the Privileged specification for validity, permission checking, and atomicity. diff --git a/src/iommu_intro.adoc b/src/iommu_intro.adoc index f7d341a0..b709409e 100644 --- a/src/iommu_intro.adoc +++ b/src/iommu_intro.adoc @@ -103,7 +103,7 @@ accesses. The IOMMU may employ similar address translation caches, referred as IOMMU Address Translation Cache (IOATC). The IOMMU provides mechanisms for software to synchronize the IOATC with the memory resident data structures used for address translation when they are modified. Software may configure the -device context with a software defined context identifier called guest +device/process context with a software defined context identifier called guest soft-context identifier (`GSCID`) to indicate that a collection of devices are assigned to the same VM and thus access a common virtual address space. Software may configure the process context with a software defined context @@ -139,10 +139,10 @@ would naturally be subject to the same address translation that an IOMMU applies to other memory writes. However, the RISC-V Advanced Interrupt Architecture cite:[AIA] requires that IOMMUs treat MSIs directed to virtual machines specially, in part to simplify software, and in part to allow optional -support for memory-resident interrupt files. The device context is configured by +support for memory-resident interrupt files. The device/process context is configured by software with parameters to identify memory accesses to a virtual interrupt file and to be translated using a MSI address translation table configured by software -in the device context. +in the device/process context. === Glossary .Terms and definitions @@ -170,6 +170,7 @@ in the device context. | DMA | Direct Memory Access. | GPA | Guest Physical Address: An address in the virtualized physical memory space of a virtual machine. +| GIPC | G-stage page table In Process Context. | GSCID | Guest soft-context identifier: An identification number used by software to uniquely identify a collection of devices assigned to a virtual machine. An IOMMU may tag IOATC diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 56305e16..e1cf487b 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -151,7 +151,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features. {bits: 1, name: 'PD8'}, {bits: 1, name: 'PD17'}, {bits: 1, name: 'PD20'}, - {bits: 15, name: 'reserved'}, + {bits: 1, name: 'GIPC'}, + {bits: 14, name: 'reserved'}, {bits: 8, name: 'custom'}, ], config:{lanes: 8, hspace:1024}} .... @@ -221,7 +222,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features. |38 |`PD8` |RO | One level PDT with 8-bit process_id supported. |39 |`PD17` |RO | Two level PDT with 17-bit process_id supported. |40 |`PD20` |RO | Three level PDT with 20-bit process_id supported. -|55:41 | reserved |RO | Reserved for standard use +|41 |`GIPC` |RO | G-stage page table In Process Context supported. +|55:42 | reserved |RO | Reserved for standard use |63:56 |_custom_ |RO | _Designated for custom use_ |=== diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index 8585bb1e..3400f863 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -146,9 +146,9 @@ device with `device_id = D`) then the following invalidations must be performed: * `IODIR.INVAL_DDT` with `DV=1` and `DID=D` * If `DC.tc.PDTV==1` then `IODIR.INVAL_PDT` with `DV=1`, `PV=0`, and `DID=D` -* If `DC.iohgatp.MODE != Bare` -** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC.iohgatp.GSCID` -** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID` +* If `DC/PC.iohgatp.MODE != Bare` +** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC/PC.iohgatp.GSCID` +** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC/PC.iohgatp.GSCID` * else ** If `DC.tc.PDTV==1 || DC.tc.PDTV == 0 && DC.fsc.MODE == Bare` *** `IOTINVAL.VMA` with `GV=AV=PSCV=0` @@ -171,8 +171,8 @@ If software changes a leaf-level PDT entry (i.e, a process context (`PC`), for performed: * `IODIR.INVAL_PDT` with `DV=1`, `PV=1`, `DID=D` and `PID=P` -* If `DC.iohgatp.MODE != Bare` -** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC.iohgatp.GSCID`, +* If `DC/PC.iohgatp.MODE != Bare` +** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC/PC.iohgatp.GSCID`, and `PSCID=PC.PSCID` * else ** `IOTINVAL.VMA` with `GV=0`, `AV=0`, `PV=1`, and `PSCID=PC.PSCID` @@ -188,12 +188,12 @@ number `I` that corresponds to an untranslated MSI address `A` then the followin invalidations must be performed: * `IOTINVAL.GVMA` with `GV=AV=1`, `ADDR[63:12]=A[63:12]` and - `GSCID=DC.iohgatp.GSCID` + `GSCID=DC/PC.iohgatp.GSCID` To invalidate all cache entries from a MSI page table the following invalidations must be performed: -* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID` +* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC/PC.iohgatp.GSCID` Between a change to the MSI PTE and when an invalidation command to invalidate the cached PTE is processed by the IOMMU, the IOMMU may use the old PTE value @@ -207,19 +207,20 @@ If software changes a leaf second-stage page-table entry of a VM where the chang affects translation for a guest-PPN `G` then the following invalidations must be performed: -* `IOTINVAL.GVMA` with `GV=AV=1`, `GSCID=DC.iohgatp.GSCID`, and `ADDR[63:12]=G` +* `IOTINVAL.GVMA` with `GV=AV=1`, `GSCID=DC/PC.iohgatp.GSCID`, and `ADDR[63:12]=G` If software changes a non-leaf second-stage page-table entry of a VM then the following invalidations must be performed: -* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, `GSCID=DC.iohgatp.GSCID` +* `IOTINVAL.GVMA` with `GV=1`, `AV=0`, `GSCID=DC/PC.iohgatp.GSCID` -The `DC` has fields that hold a guest-PPN. An implementation may translate such +The `DC` has fields that hold a guest-PPN when `DC.tc.GIPC = 0`. An implementation may translate such fields to a supervisor-PPN as part of caching the `DC`. If the second-stage page table update affects translation of guest-PPN held in the `DC` then software must invalidate all such cached `DC` using `IODIR.INVAL_DDT` with `DV=1` and `DID` set to the corresponding `device_id`. Alternatively, an `IODIR.INVAL_DDT` with `DV=0` may be used to invalidate all cached `DC`. +The `DC` hasn't fields that hold a guest-PPN when `DC.tc.GIPC = 1`. Between a change to the second-stage PTE and when an invalidation command to invalidate the cached PTE is processed by the IOMMU, the IOMMU may use the @@ -238,7 +239,7 @@ specified in <>. When a change is made to a first-stage page table, and the second-stage is not Bare, then software must perform invalidations using `IOTINVAL.VMA` with -`GV=1`, `GSCID=DC.iohgatp.GSCID` and `AV` and `PSCV` operands appropriate for +`GV=1`, `GSCID=DC/PC.iohgatp.GSCID` and `AV` and `PSCV` operands appropriate for the modification as specified in <>. Between a change to the first-stage PTE and when an invalidation command to