Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidated events vs meta in core [Discussion] #4624

Open
MonsieurNicolas opened this issue Jan 16, 2025 · 5 comments
Open

Consolidated events vs meta in core [Discussion] #4624

MonsieurNicolas opened this issue Jan 16, 2025 · 5 comments

Comments

@MonsieurNicolas
Copy link
Contributor

Related to stellar/stellar-protocol#1553 (standardize on how classic token transfer/clawback/min/burn events can be represented).

One way to think about how those events are generated is that they are the result of some transformation of the “raw” meta TransactionMetaV3 at the transaction and operation level (LedgerEntryChanges and potentially with some information from Transaction and TransactionResult).

For most modern use cases that need to support Soroban, the “raw” format is not useful, and systems instead need to rely on consolidated events.

This opens the possibility that the transformation above could be performed at the core layer.

There are a few advantages with this approach:

  • The size of the meta gets reduced (by over 60% for path payments)
  • There is standardization of the unified events at the lowest level of the stack, and protocol details are not leaked downstream (not the case if events are only relying on operation type and LedgerEntryChanges).
  • Opens the door to trusted unified events in the future

Disadvantages:

  • The size savings may not be big enough to counter the additional complexity in core and downstream to support all combinations of data
  • Additional complexity for operators that need to adjust their configuration in order to generate “full meta” (or all sorts of variations, some systems may need to track actual values for some ledger entries like WASM or NFT metadata)
  • Iterations over this will require regeneration of the entire meta data lake for each iteration (and associated reingestion by all downstream systems)
  • Downstream needs to transform the data anyways, in particular when enabling filtering (where the larger gains in terms of data size are) in the early stage of the CDP.

Appendix

Ledger Entry average sizes (2025-01-15):

  • Account = 127
  • Offer = 139
  • TrustLine = 129

Path Payment

Meta size estimation

Asset/Asset path payment will touch: Src trustline, offer + counterparty trustlines, dest trustline
129+(139+2*129)+129=655 bytes

Meta for before/after -> 655*2=1,310 bytes.

Event size estimation

Each transfer
16+3*32+4*8+16=160 (if in the form "transfer", source, destination, asset issuer/contract, amount)
Or 160+57+4=221 if including asset string (with the asset code anywhere from 1 to 12 characters)

2 transfers are 160*2=320 or 221*2=442 bytes.

A saving of 1310-442=868 bytes (66%).

For a longer path payments, savings are even higher.

@MonsieurNicolas
Copy link
Contributor Author

My current opinion based on the above is that it seems that until full standardization of all events is achieved, generation at the core layer may not be desirable and more of a premature optimization.

@leighmcculloch
Copy link
Member

The motivation is improving the quality of the data, the quality of the protocol as a product, not optimisation imo.

@leighmcculloch
Copy link
Member

  • additional complexity in ... downstream to support all combinations of data

I think it's worth digging into this. Could you elaborate on the complexity in downstream?

@mollykarcher
Copy link

As Leigh said, reducing the data size is not the motivation here, and the description here also seems to imply that we would remove the corresponding ledger meta at the same time that we added events. That is something that has not been scoped or discussed in the discussion here and would likely have significant downstream impact. This is definitely something that should be raised over in that discussion, and not here in a separate issue where likely less people will see it and understand the implications. So far in the protocol discussion, I think there has been the presumption that these events are in addition to any existing ledger meta, not a in replacement of (at least for now / in the short term). Such a change, should we ever want to do it, would also need to be done across multiple protocol releases to ensure backwards compatibility for downstream systems.

The size savings may not be big enough to counter the additional complexity in core and downstream to support all combinations of data

Like Leigh, I'm not sure what you mean here. Events are making it simpler for downstream systems to parse this data. Of course, if we don't retroactively emit events for classic, that would complicate things for downstream, because systems would need to shift where data is coming from starting at p23/24. But that point has already been discussed in the other github discussion, and it's my understanding we will be retroactively emitting events.

I could similarly comment on each of the Disadvantages that you list, but it sounds to be like they're build on the assumption I reference above. That is, that events are replacement existing change meta. This point should be hashed out in the other discussion, because this has never been discussed explicitly over there and it was not my assumption that that's what would be happening, so other's may have a similar misunderstanding.

@MonsieurNicolas
Copy link
Contributor Author

The linked issue is about event standardization for transfers, I wanted to have a conversation on what needs to happen at the core layer on a longer timeframe.

The complexity I am talking about is that the effect of having toggles at the core layer that controls how meta gets produced has a multiplying effect on complexity downstream:
downstream systems will have to support all possible combinations of meta + various levels of events getting produced.

Every time the meta format changes at the core layer, you have to make a choice downstream:

  • regenerate meta for the entire data lake
  • support multiple versions of the meta, yet provide guarantees that nothing breaks because of the partial event dataset

While iterating on events, it seems a lot simpler to just rerun a transform job on the data lake without having to replay all of history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants