Add state archival getledgerentry endpoint #4623

SirTyson · 2025-01-15T07:42:21Z

Description

Resolves the p23 portions of #4397 by adding the getledgerentry HTTP endpoint, which returns LedgerEntries with the relevant TTL and state information.

This is also the cutoff point for early RPC integration of p23. Compared to the last core release (22.1), RPC will need to integrate the following changes. Note that all these changes are protocol gated and are only in effect if the package is running protocol 23.

RPC Integration Changes

Persistent entry eviction

Persistent CONTRACT_DATA entries and all CONTRACT_CODE entries are now subject to eviction. This means that an expired entry can be evicted and deleted from the live BucketList during eviction scans. evictedTemporaryLedgerKeys will now contains evicted temporary CONTRACT_DATA keys and TTLs as before, but will now additionally hold CONTRACT_CODE and persistent CONTRACT_DATA keys along with their TTLs. evictedPersistentLedgerEntries is still never populated. With the final p23 build we will rename evictedTemporaryLedgerKeys but it's not included in this build. XDR changes will come later.

Changes to Restoration Meta

Restoration meta will depend on the "state" of the entry being restored. The two states of the entry being restored are

archived: Entry has an expired TTL, but still lives on the live BucketList. Entry has not yet been evicted, so the key has not been populated in the evictedTemporaryLedgerKeys vector yet.
evicted: Entry has been fully evicted from the live BucketList and is now stored in the Hot Archive BucketList. Key has been emitted in the evictedTemporaryLedgerKeys vector.

Not: I realize this terminology is a bit confusing because we now have two "archived" states. That being said, the evicted state is only relevant to RPC and core and is completely abstracted from developers and people invoking contracts. Given the feedback from the original expiration terminology, I'd like to avoid it if at all possible and use the archived vs evicted terminology.

RestoreFootprintOp will produce meta as follows.

archived keys
In protocols < 23, meta was as follows:

CONTRACT_CODE/CONTRACT_DATA entry:
no meta

TTL entry:
LEDGER_ENTRY_STATE(oldValue), LEDGER_ENTRY_UPDATED(newValue)

In protocol >= 23, meta is as follows:

CONTRACT_CODE/CONTRACT_DATA entry:
LEDGER_ENTRY_RESTORE(value)

TTL entry:
LEDGER_ENTRY_STATE(oldValue), LEDGER_ENTRY_RESTORE(newValue)

evicted keys

In protocol < 23, there were no evicted key restorations.

In protocol >= 23, meta is as follows:

CONTRACT_CODE/CONTRACT_DATA entry:
LEDGER_ENTRY_RESTORE(value)

TTL entry:
LEDGER_ENTRY_RESTORE(value)

`getledgerentry` captive-core endpoint for Ledger State

With protocol 23, ledger DBs are getting more complicated, as entries are stored both in the live BucketList DB and in the Hot Archive DB. To properly simulate TXs, it will be necessary to know what DB an entry is in, as well as it's "state" wrt TTL value. This logic is complicated to replicate outside of core, especially since much of the state information is intrinsic to the structure of the BucketList and may be expensive to replicate in SQL. Instead of maintaining a DB of ledger state, it is recommended that RPC use the new getledgerentry captive-core endpoint for all LedgerState access. Core now comes with a multithreaded HTTP server implementation that seems sufficiently fast over local host for all state accesses see this.

By default, this "query server" is disabled in core. It can be enabled and configured with the following captive-core flags

# HTTP_QUERY_PORT (integer) default 0
# What port stellar-core listens for query commands on,
# such as getledgerentryraw.
# If set to 0, disable HTTP query interface entirely.
# Must not be the same as HTTP_PORT if not 0.
HTTP_QUERY_PORT=0

# QUERY_THREAD_POOL_SIZE (integer) default 4
# Number of threads available for processing query commands.
# If HTTP_QUERY_PORT == 0, this option is ignored.
QUERY_THREAD_POOL_SIZE=4

# QUERY_SNAPSHOT_LEDGERS (integer) default 0
# Number of historical ledger snapshots to maintain for
# query commands. Note: Setting this to large values may
# significantly impact performance. Additionally, these
# snapshots are a "best effort" only and not persisted on
# restart. On restart, only the current ledger will be
# available, with snapshots avaiable as ledgers close.
QUERY_SNAPSHOT_LEDGERS = 0

Once enabled, the following endpoint will be available:

getledgerentry

Used to query both live and archived LedgerEntries. While getledgerentryraw does simple key-value lookup
on the live ledger, getledgerentry will query a given key in both the live BucketList and Hot Archive BucketList.
It will also report whether an entry is archived, evicted, or live, and return the entry's current TTL value.

A POST request with the following body:

ledgerSeq=NUM&key=Base64&key=Base64...

ledgerSeq: An optional parameter, specifying the ledger snapshot to base the query on.
If the specified ledger in not available, a 404 error will be returned. If this parameter
is not set, the current ledger is used.
key: A series of Base64 encoded XDR strings specifying the LedgerKey to query. TTL keys
must not be queried and will return 400 if done so.

A JSON payload is returned as follows:

{
"entries": [
     {"e": "Base64-LedgerEntry", "state": "live", /*optional*/ "ttl": uint32},
     {"e": "Base64-LedgerKey", "state": "new"},
     {"e": "Base64-LedgerEntry", "state": "archived"},
     {"e": "Base64-LedgerEntry", "state": "evicted"}
],
"ledger": ledgerSeq
}

entries: A list of entries for each queried LedgerKey. Every key queried is guaranteed to
have a corresponding entry returned.
e: Either the LedgerEntry or LedgerKey for a given key encoded as a Base64 string. If a key
is live or archived, e contains the corresponding LedgerEntry. If a key does not exist
(including expired temporary entries) e contains the corresponding LedgerKey.
state: One of the following values:
- live: Entry is live.
- new: Entry does not exist. Either the entry has never existed or is an expired temp entry.
- archived: Entry is archived, but not yet evicted, counts towards in-memory resources.
- evicted: Entry is archived and evicted, counts towards disk resources.
ttl: An optional value, only returned for live Soroban entries. Contains
a uint32 value for the entry's liveUntilLedgerSeq.
ledgerSeq: The ledger number on which the query was performed.

Classic entries will always return a state of live or new.
If a classic entry does not exist, it will have a state of new.

Similarly, temporary Soroban entries will always return a state of live or
new. If a temporary entry does not exist or has expired, it
will have a state of new.

This endpoint will always give correct information for archived entries. Even
if an entry has been archived and evicted to the Hot Archive, this endpoint will
still the archived entry's full LedgerEntry as well as the proper state.

RPC Testing

To test the new changes, RPC will want to have tests that restore entries that are in both the archived and evicted state. What I've been doing in my tests is populating state with a bunch of persistent entries, letting them expire, but setting my eviction scan parameters such that only 1 or 2 entries are evicted at a time. You can keep track of an entry's starting TTL and whether or not you have seen eviction meta to determine it's state within the test. These flags are helpful.

OVERRIDE_EVICTION_PARAMS_FOR_TESTING=true

# Scan 1 million bytes per ledger. This lets you evict aggressively, but if it slows down the test too
# much it can be reduced
TESTING_EVICTION_SCAN_SIZE=1000000

# Entries are eligible for eviction early than normal
TESTING_STARTING_EVICTION_SCAN_LEVEL=2

# A maximum of 2 entries will be evicted per ledger. This helps the test maintain a good
# mix of entries in both the evicted and archived state
TESTING_MAX_ENTRIES_TO_ARCHIVE=2

# Entries are eligible for eviction sooner
TESTING_MINIMUM_PERSISTENT_ENTRY_LIFETIME=16

Make sure you also have the QUERY_SERVER flags set as well so you can use the core endpoint. I think this has been thorough enough, but if I missed anything CAP 62 and CAP 66 have full specs.

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

SirTyson added 3 commits January 14, 2025 17:31

Added Hot BucketList support to History Archive

3c0b7cf

assumeState and catchup tests for Hot Archive BucketList

796392c

Fixed tests

c6d900c

SirTyson force-pushed the getledgerentry-endpoint branch 3 times, most recently from 474ca65 to 0865066 Compare January 15, 2025 20:15

Added state archival getledgerentry http endpoint

4cd8801

SirTyson force-pushed the getledgerentry-endpoint branch from 0865066 to 4cd8801 Compare January 15, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add state archival getledgerentry endpoint #4623

Add state archival getledgerentry endpoint #4623

SirTyson commented Jan 15, 2025 •

edited

Loading

Add state archival getledgerentry endpoint #4623

Are you sure you want to change the base?

Add state archival getledgerentry endpoint #4623

Conversation

SirTyson commented Jan 15, 2025 • edited Loading

Description

RPC Integration Changes

Persistent entry eviction

Changes to Restoration Meta

getledgerentry captive-core endpoint for Ledger State

RPC Testing

Checklist

SirTyson commented Jan 15, 2025 •

edited

Loading

`getledgerentry` captive-core endpoint for Ledger State