-
Notifications
You must be signed in to change notification settings - Fork 161
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
88 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# An Ideal API for Consensus and the Ledger | ||
|
||
This is written from the point of view of a consumer (ie `cardano-db-sync`) of data from the | ||
consensus and ledger layers. It describes the problems I, as the main dev on `db-sync`, see | ||
in what we have now and then gives top level details of how I think consensus and ledger | ||
should provide to the consumer like `db-sync`. | ||
|
||
### Problems with What we Have Now | ||
|
||
Consensus (along with the networking code) and ledger-specs are developed in two separate | ||
Git repositories and neither has a well thought out or evolved API. Instead they simply expose | ||
their internals. The different era's (Shelley, Allegra, Mary etc) all have their own data | ||
types which are unified using type families. Unfortunately, with type families, changes in the | ||
library can cause particularly obtuse error messages in client code. However useful type | ||
families might be for developing the code for the different eras, having to deal with the | ||
different types, even when unified using type families, only makes things more difficult for | ||
clients like `db-sync`. | ||
|
||
In order to populate it's database `db-sync` currently requires: | ||
|
||
* A `LocalChainSync` client to get blocks from the blockchain. | ||
* A ledger state to get information that is not recorded on chain (rewards, stake distribution | ||
etc). | ||
* A ledger state query to get block information that is not on the blockchain (block time stamp, | ||
epoch number, slot within epoch etc). | ||
* Data obtained as an aggregation query on data already in the database (sum of tx inputs, | ||
deposit value in a transaction etc). | ||
|
||
For the case of adatabase aggregation queries, it is not a problem for things like populating the | ||
epoch table, but is a significant performance hit when calculating things like the deposit amount | ||
for every transaction. | ||
|
||
It should also be noted that because `db-sync` has to insert data into PostgreSQL, it will likely | ||
be the first Cardano component to hit issues where its performance cannot keep up with a new | ||
block arriving every 20 seconds. Anything that can reduce the amount of computation `db-sync` | ||
needs to do improves its performance. | ||
|
||
To sumarize, the problems with the current approach: | ||
|
||
* There are four different mechanisms to get all the information needed by the database. | ||
* `db-sync` needs to maintain a copy of ledger state that is identical to the copy of the | ||
ledger state in the `node`. With `db-sync`, that means applying a block to an existing ledger | ||
state is done twice; once in the `node` and once in `db-sync`. The amount of code required to | ||
maintain ledger state is significant and basically duplicates code in `consensus`. | ||
* Some data that goes into the database (eg the `deposit` field of the `tx` table) must be | ||
calculated using a database query. This needs to be done for every transaction in every | ||
block and is not a cheap operation. | ||
|
||
|
||
### What Data is Needed? | ||
|
||
The data needed by `db-sync` but not actually part of the blockchain (not necessarily an | ||
exhaustive list): | ||
|
||
* UTC time stamp for each block (currently calculated with a local state query). | ||
* Epoch number and slot number within an epoch (currently calculated with a local state query). | ||
* The rewards for each epoch (extracted from the ledger state). | ||
* The stake distribution for each epoch (extracted from the ledger state). | ||
* The deposit value for each transaction (requires a database query). | ||
|
||
|
||
### An Ideal API for a `db-sync`-like Consumer | ||
|
||
The ideal API for a `db-sync` like consumer would not require the consumer to maintain its | ||
own ledger state. Instead, the API would provide two things: | ||
|
||
* Enhanced or annotated blocks (these are not blocks as they would appear on the blockchain), | ||
with addition information like a UTC time stamp, current era, epoch number, slot within | ||
an epoch etc. These annotated blocks are era independent. | ||
* Enhanced/annotated blocks would not contain the blockchain version of blocks, but an enhanced | ||
block with things like the deposit value for each transaction. Like the annotated blocks, | ||
annotated transactions would also be era independent. | ||
* Ledger events notifying of things that are difficult to obtain just by looking at the current | ||
block. This would include things like: | ||
|
||
``` | ||
data LedgerEvent | ||
= LedgerNewEra !Era | ||
| LedgerNewEpoch !EpochNo | ||
| LedgerRewards !EpochNo !Rewards | ||
| LedgerStakeDist !StakeDist | ||
| LedgerBlock !AnnotatedBlock | ||
``` | ||
Rewards and stake distribution could be calculated incrementally by the ledger and the partial | ||
results could be passed to `db-sync` as they become available. | ||
|
||
There would then be something like the `LocalChainSync` protocol that passes `LedgerEvent` | ||
objects over the connection rather than the current blockchain version of the blocks. |