-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve quorum caching (again) #5761
fix: Improve quorum caching (again) #5761
Conversation
…keepOldConnections`
… values might be CPU/disk heavy
6549784
to
27bef79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
Is this not a breaking change in the sense that the response of some RPCs may have been changed / reduced, in a way that is not fully backwards compatible? Maybe I am mis-understanding the scope of the potential RPC changes. But I think if an RPC previously would have returned say 10 quorums but now only returns 5 that'd be a breaking change no? |
There should be no changes in RPC results, in 6200640 we simply warn users that fast results are only granted for active quorum sets. It's actually an improvement because |
bb47181
to
947c626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK for squash merge
## Issue being fixed or feature implemented 1. `scanQuorumsCache` is a special one and we use it incorrectly. 2. Platform doesn't really use anything that calls `ScanQuorums()` directly, they specify the exact quorum hash in RPCs so it's `GetQuorum()` that is used instead. The only place `ScanQuorums()` is used for Platform related stuff is `StartCleanupOldQuorumDataThread()` because we want to preserve quorum data used by `GetQuorum()`. But this can be optimised with its own (much more compact) cache. 3. RPCs that use `ScanQuorums()` should in most cases be ok with smaller cache, for other use cases there is a note in help text now. ## What was done? pls see individual commits ## How Has This Been Tested? run tests, run a node (~in progress~ looks stable) ## Breaking Changes n/a ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone
…ms (#5784) ## Issue being fixed or feature implemented Cache population for old quorums is a cpu heavy operation and should be avoided for inactive quorums _at least_ oin `ScanQuorums`. This issue is critical for testnet and other small network because every mn participate in almost every platform quorum and cache population for 2 months of quorums can easily block everything for 15+ minutes on a 4 cpu node. On mainnet quorum distribution is much better but it's still a small waste of cpu (or not so small for unlucky nodes). #5761 follow-up ## What was done? Do not start cache population for outdated quorums, improve logs in `StartCachePopulatorThread` to make it easier to see what's going on. ## How Has This Been Tested? run a mn on testnet ## Breaking Changes n/a ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_
…ms (dashpay#5784) ## Issue being fixed or feature implemented Cache population for old quorums is a cpu heavy operation and should be avoided for inactive quorums _at least_ oin `ScanQuorums`. This issue is critical for testnet and other small network because every mn participate in almost every platform quorum and cache population for 2 months of quorums can easily block everything for 15+ minutes on a 4 cpu node. On mainnet quorum distribution is much better but it's still a small waste of cpu (or not so small for unlucky nodes). dashpay#5761 follow-up ## What was done? Do not start cache population for outdated quorums, improve logs in `StartCachePopulatorThread` to make it easier to see what's going on. ## How Has This Been Tested? run a mn on testnet ## Breaking Changes n/a ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_
## Issue being fixed or feature implemented 1. `scanQuorumsCache` is a special one and we use it incorrectly. 2. Platform doesn't really use anything that calls `ScanQuorums()` directly, they specify the exact quorum hash in RPCs so it's `GetQuorum()` that is used instead. The only place `ScanQuorums()` is used for Platform related stuff is `StartCleanupOldQuorumDataThread()` because we want to preserve quorum data used by `GetQuorum()`. But this can be optimised with its own (much more compact) cache. 3. RPCs that use `ScanQuorums()` should in most cases be ok with smaller cache, for other use cases there is a note in help text now. ## What was done? pls see individual commits ## How Has This Been Tested? run tests, run a node (~in progress~ looks stable) ## Breaking Changes n/a ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone
…ms (dashpay#5784) ## Issue being fixed or feature implemented Cache population for old quorums is a cpu heavy operation and should be avoided for inactive quorums _at least_ oin `ScanQuorums`. This issue is critical for testnet and other small network because every mn participate in almost every platform quorum and cache population for 2 months of quorums can easily block everything for 15+ minutes on a 4 cpu node. On mainnet quorum distribution is much better but it's still a small waste of cpu (or not so small for unlucky nodes). dashpay#5761 follow-up ## What was done? Do not start cache population for outdated quorums, improve logs in `StartCachePopulatorThread` to make it easier to see what's going on. ## How Has This Been Tested? run a mn on testnet ## Breaking Changes n/a ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_
Issue being fixed or feature implemented
scanQuorumsCache
is a special one and we use it incorrectly.ScanQuorums()
directly, they specify the exact quorum hash in RPCs so it'sGetQuorum()
that is used instead. The only placeScanQuorums()
is used for Platform related stuff isStartCleanupOldQuorumDataThread()
because we want to preserve quorum data used byGetQuorum()
. But this can be optimised with its own (much more compact) cache.ScanQuorums()
should in most cases be ok with smaller cache, for other use cases there is a note in help text now.What was done?
pls see individual commits
How Has This Been Tested?
run tests, run a node (
in progresslooks stable)Breaking Changes
n/a
Checklist: