Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

background overlay: move doing transaction validation in the background before forwarding to tx queue, fast failing invalid transactions #4316

Open
anupsdf opened this issue May 14, 2024 · 5 comments · May be fixed by #4617
Assignees

Comments

@anupsdf
Copy link
Contributor

anupsdf commented May 14, 2024

No description provided.

@marta-lokhova marta-lokhova changed the title background overlay: move doing transaction validation in the background before forwarding to tx queue, which allows fast failing invalid transactions background overlay: move doing transaction validation in the background before forwarding to tx queue, fast failing invalid transactions Jun 5, 2024
@marta-lokhova
Copy link
Contributor

Note that with the switch to BucketList snapshots, we're losing LedgerTxn's internal entry cache, which might be problematic for performance.

@MonsieurNicolas
Copy link
Contributor

Note that with the switch to BucketList snapshots, we're losing LedgerTxn's internal entry cache, which might be problematic for performance.

it sounds like we need a proper performance evaluation to ensure that we don't introduce performance regressions both when closing ledgers (catchup) and when flooding.

@bboston7
Copy link
Contributor

bboston7 commented Oct 9, 2024

Here are 3 options I see, in order of increasing amounts of backgrounding:

  1. Background the signature checks only. With this option, we would check transaction signatures in the background upon receiving them over the network. If signature verification succeeds, we continue with the current Herder::recvTransaction flow in the main thread. Otherwise, we drop the transaction. Then, when core validates the transaction it will achieve a cache hit in the signature cache, thus reducing the total work the main thread has to do. The benefits of this option is that it is small and simplest to get the threading right, while also tackling one of the most expensive parts of transaction validation.
  2. Background TransactionFrame::checkValid. With this option, we would run checkValid on transactions upon receiving them over the network. If they pass, they'll be passed on to the transaction queue in the main thread for the remaining checks and final adding to the queue. The benefit of this option is that it moves even more to the background, with the drawback of requiring even more work to achieve thread safety (for example, checkValid must be refactored to not take an Application).
  3. Background TransactionQueue::canAdd. This option backgrounds all of the checks run when adding a transaction to the queue. The benefit of this option is that it includes some structural checks that (1) and (2) don't, and therefore it preserves the same order of checks that stellar-core currently performs. The drawback is that it requires even more care to achieve safe multithreading, and I suspect that these structural checks do not make up a significant portion of time taken to add a transaction to the queue. This may be beyond the point of diminishing returns for backgrounding transaction validation.

Note that these options only apply to transactions received over the network. There will be no backgrounding of validation for transactions received over the HTTP endpoint. That will preserve the current order of checks (and therefore error codes) for locally submitted transactions.

I plan to use Tracy to figure out what the most expensive portions of transaction validation are to help answer which of these options we should take, but I also wanted to put these out there to gather feedback from anyone who has an opinion on these three choices (or any others that I may have missed).

@marta-lokhova
Copy link
Contributor

marta-lokhova commented Oct 15, 2024

Thanks for putting together the options! I think we can carve out a change that's not too complicated in terms of implementation, but yields a good perf gain. Thinking about this more, a few questions:

  • I think option 3 requires a significant amount of work, since this means we'd need to add synchronization to the whole transactions queue class. What do we get in terms of perf gains? For the error checking, we should be able to re-arrange things to avoid unnecessary synchronization.
  • checkValid should hopefully be fairly simple to make thread-safe given that we use snapshots now. Also, Cleanup apply and validation flows #4471 refactors checkValid to take AppConnector, which forces us to either run it from the main thread, or implement thread-safe methods. That being said, what benefits do we gain from going the checkValid route vs just signature verification?
  • Another idea (a bit more involved, but could yield really good benefits): if we're able to do checkValid in the background, what benefits would be gain from flooding valid transactions right away from the background? This would remove the main thread head-of-line blocking issue that txs face right now. Note that we'd need to do something about the flooding schedule: that is, currently we don't eagerly flood right away, but flood in batches on a schedule (via FLOOD_OP_RATE_PER_LEDGER and FLOOD_TX_PERIOD_MS configs ). The periodic flooding is done from the main thread right now. Implementation-wise, this is closer to option 3, since we'd need to synchronize access to the tx queue, but I think this may yield a pretty dramatic improvement to tx flooding.
  • I agree that some experimentation and understanding of end-to-end latency of transaction processing would be really useful to understand the path forward here.

@marta-lokhova
Copy link
Contributor

Thinking more about synchronization at the transaction queue layer: a simpler approach might be removing overlay dependency on the transaction queue completely. This way, overlay would maintain its own pool of transactions to flood, removing the need to share state with main. I believe that would be a step in the right direction, as it would further isolate the "relay" component of core.

bboston7 added a commit to bboston7/stellar-core that referenced this issue Jan 9, 2025
This is a *draft* change that will resolve stellar#4316 when it is complete.
The change makes `TransactionQueue` thread safe and runs the `tryAdd`
function in the background when the feature is enabled. The
implementation closely follows the
[design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing)
I wrote.  The implementation still requires the main thread to
re-broadcast the transactions (for now). I've opened this PR for
visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks
remaining:

* [ ] Fix catchup. I seem to have broken catchup in rebasing these
      changes on master. I need to figure out what is going on there and fix
      it.
* [ ] Fix failing tests. These are failing because they don't update
      `TransactionQueue`s new snapshots correctly.
* [ ] Rigorous testing, both for correctness and performance.
* [ ] I'd like to take a look at pushing the cut-point out a bit to
      enable flooding in the background as well. If this is a relatively
      simple change, I'd like to roll it into this PR. If it looks hairy,
      then I'll leave it for a separate change later.
bboston7 added a commit to bboston7/stellar-core that referenced this issue Jan 9, 2025
This is a *draft* change that will resolve stellar#4316 when it is complete.
The change makes `TransactionQueue` thread safe and runs the `tryAdd`
function in the background when the feature is enabled. The
implementation closely follows the
[design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing)
I wrote.  The implementation still requires the main thread to
re-broadcast the transactions (for now). I've opened this PR for
visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks
remaining:

* [ ] Fix catchup. I seem to have broken catchup in rebasing these
      changes on master. I need to figure out what is going on there and fix
      it.
* [ ] Fix failing tests. These are failing because they don't update
      `TransactionQueue`s new snapshots correctly.
* [ ] Rigorous testing, both for correctness and performance.
* [ ] I'd like to take a look at pushing the cut-point out a bit to
      enable flooding in the background as well. If this is a relatively
      simple change, I'd like to roll it into this PR. If it looks hairy,
      then I'll leave it for a separate change later.
@bboston7 bboston7 linked a pull request Jan 9, 2025 that will close this issue
7 tasks
bboston7 added a commit to bboston7/stellar-core that referenced this issue Jan 16, 2025
This is a *draft* change that will resolve stellar#4316 when it is complete.
The change makes `TransactionQueue` thread safe and runs the `tryAdd`
function in the background when the feature is enabled. The
implementation closely follows the
[design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing)
I wrote.  The implementation still requires the main thread to
re-broadcast the transactions (for now). I've opened this PR for
visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks
remaining:

* [ ] Fix catchup. I seem to have broken catchup in rebasing these
      changes on master. I need to figure out what is going on there and fix
      it.
* [ ] Fix failing tests. These are failing because they don't update
      `TransactionQueue`s new snapshots correctly.
* [ ] Rigorous testing, both for correctness and performance.
* [ ] I'd like to take a look at pushing the cut-point out a bit to
      enable flooding in the background as well. If this is a relatively
      simple change, I'd like to roll it into this PR. If it looks hairy,
      then I'll leave it for a separate change later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@bboston7 @MonsieurNicolas @marta-lokhova @SirTyson @anupsdf and others