Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Resolves #4317 Concludes #4128 The implementation of this proposal requires massive changes to the stellar-core codebase, and touches almost every subsystem. There are some paradigm shifts in how the program executes, that I will discuss below for posterity. The same ideas are reflected in code comments as well, as it’ll be important for code maintenance and extensibility ## Database access Currently, only Postgres DB backend is supported, as it required minimal changes to how DB queries are structured (Postgres provides a fairly nice concurrency model). SQLite concurrency support is a lot more rudimentary, with only a single writer allowed, and the whole database is locked during writing. This necessitates further changes in core (such as splitting the database into two). Given that most network infrastructure is on Postgres right now, SQLite support can be added later. ### Reduced responsibilities of SQL SQL tables have been trimmed as much as possible to avoid conflicts, essentially we only store persistent state such as the latest LCL and SCP history, as well as legacy OFFER table. ## Asynchronous externalize flow There are three important subsystems in core that are in charge of tracking consensus, externalizing and applying ledgers, and advancing the state machine to catchup or synced state: - Herder: receives SCP messages, forwards them to SCP, decides if a ledger is externalized, triggers voting for the next ledger - LedgerManager: implements closing of a ledger, sets catchup vs synced state, advances and persists last closed ledger. - CatchupManager: Keep track of any externalized ledgers that are not LCL+1. That is, keep track of future externalizing ledgers, attempt applying them to keep core in sync, and trigger catchup if needed. Prior to this change, the externalize flow had two different flows: - If core received LCL+1, it would immediately apply it. Which means the flow externalize → closeLedger → set “synced” state happened in one synchronous function. After application, core triggers the next ledger, usually asynchronously, as it needs to wait to meet the 5s ledger requirement. - If core received ledger LCL+2..LCL+N it would asynchronously buffer it, and continue buffering new ledgers. If core can’t close the gap and apply everything sequentially, it would go into catchup flow. With the new changes, the triggering ledger close flow moved to CatchupManager completely. Essentially, CatchupManager::processLedger became a centralized place to decide whether to apply a ledger, or trigger catchup. Because ledger close happens in the background, the transition between externalize and “closeLedger→set synced” becomes asynchronous. ## Concurrent ledger close List of core items that moved to the background followed by explanation why it is safe to do so: ### Emitting meta Ledger application is the only process that touches the meta pipe, no conflicts with other subsystems ### Writing checkpoint files Only the background thread writes in-progress checkpoint files. Main thread deals exclusively with “complete” checkpoints, which after completion must not be touched by any subsystem except publishing. ### Updating ledger state The rest of the system operates strictly on read-only BucketList snapshots, and is unaffected by changing state. Note: there are some calls to LedgerTxn in the codebase still, but those only appear on startup during setup (when node is not operational) or in offline commands. ### Incrementing current LCL Because ledger close moved to the background, guarantees about ledger state and its staleness are now different. Previously, ledger state queried by subsystems outside of apply was always up-to-date. With this change, it is possible the snapshot used by main thread may become slightly stale (if background just closed a new ledger, but main thread hasn't refreshed its snapshot yet). There are different use cases of main thread's ledger state, which must be treated with caution and evaluated individually: - When it is safe: in cases, where LCL is used more like a heuristic or an approximation. Program correctness does not depend on the exact state of LCL. Example: post-externalize cleanup of transaction queue. We load LCL’s close time to purge invalid transactions from the queue. This is safe because if LCL has been updated while we call this, the queue is still in a consistent state. In fact, anything in the transaction queue is essentially an approximation, so a slightly stale snapshot should be safe to use. - When it is not safe: when LCL is needed in places where the latest ledger state is critical, like voting in SCP, validating blocks, etc. To avoid any unnecessary headaches, we introduce a new invariant: “applying” is a new state in the state machine, which does not allow voting and triggering next ledgers. Core must first complete applying to be able to vote on the “latest state”. In the meantime, if ledgers arrive while applying, we treat them like “future ledgers” and apply the same procedures in herder that we do today (don’t perform validation checks, don’t vote on them, and buffer them in a separate queue). The state machine remains on the main thread _only_, which ensures SCP can safely execute as long as the state transitions are correct (for example, executing a block production function can safely grab the LCL at the beginning of the function without worrying that it might change in the background). ### Reflecting state change in the bucketlist Close ledger is the only place in the code that updates the BucketList. Other subsystems may only read it. Example is garbage collection, which queries the latest BucketList state to decide which buckets to delete. These are protected with a mutex (the same LCL mutex used in LM, as bucketlist is conceptually a part of LCL as well).
- Loading branch information